Perl: Replace strings in multiple files with array entry - arrays

I am looking for a simple way to replace strings in multiple text files. In the first file the string should be replaced with the first element of the array #arrayF; in the second file the string must be replaced with the second entry etc.
I want to replace ;size=\d+ where \d+ is a wildcard for any number.
This is what I have so far:
#!/usr/bin/perl -w
use strict;
use warnings;
my $counter = 0;
my #arrayF = '/Users/majuss/Desktop/filelist.txt>'; # Reads all lines into array
my #files = '/Users/majuss/Desktop/New_Folder/*'; #get Files into an array
foreach my $file ( #files ) {
$file =~ s/;size=\d+/$arrayF[$counter]/g; #subst.
print
$counter++; #increment array index
}
It gives a zero back and nothing happens.
I know how to do it in a one-liner but I can't figure a way out how to implement an array there.

Note these points that I commented on beneath your question
The line commented Reads all lines into array doesn't do that. It simply sets #arrayF to a one-element list that holds the string /Users/majuss/Desktop/filelist.txt>. You probably need to open the file and read its contents into an array
The line commented get Files into an array doesn't do that. It simply sets #files to a one-element list that holds the string /Users/majuss/Desktop/New_Folder/*. You probably need to use glob to expand the wildcard into a list of files
The statement
$file =~ s/;size=\d+/$arrayF[$counter]/g
is attempting to modify the variable $file which contains the name of the file. Presumably you meant to edit the contents of that file, so you must open and read it first
Please don't use upper-case letters in your local identifiers
Don't use -w on the shebang line as well as use warnings; just the latter is correct
This seems to do what you're asking for, but be aware that it is untested except that I have checked that it will compile. Be careful that you have a backup of the original files as this code will overwrite the original files with the modified data
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use autodie;
my $replacement_text = '/Users/majuss/Desktop/filelist.txt';
my $file_glob = '/Users/majuss/Desktop/New_Folder/*';
my #replacement_text = do {
open my $fh, '<', $replacement_text;
<$fh>;
};
chomp #replacement_text;
my $i = 0;
for my $file ( glob $file_glob ) {
my $contents = do {
open my $in_fh, '<', $file;
local $/;
<$in_fh>;
};
$contents =~ s/;size=\d+/$replacement_text[$i]/g;
open my $out_fh, '>', $file;
print $out_fh $contents;
++$i;
}

You're not opening filelist.txt and reading it.
Do do this you need to:
open ( my $input, "<", '/Users/majuss/Desktop/filelist.txt' ) or die $!;
my #arrayF = <$input>;
close ( $input );
You need to use glob to search a directory pattern like that.
Like this:
foreach my $file ( glob ( '/Users/majuss/Desktop/New_Folder/*' ) {
# stuff
}
To search and replace within a file, it's actually a bit different to a one liner. You can look at 'in place editing' in perlrun - but this is where perl is trying to pretend to be sed. I think you can do it if you try - there's an option in perlvar:
$^I
The current value of the inplace-edit extension. Use undef to disable inplace editing.
Mnemonic: value of -i switch.
This answer may give some insight:
In-place editing of multiple files in a directory using Perl's diamond and in-place edit operator
Instead you can:
foreach my $file ( glob ( '/Users/majuss/Desktop/New_Folder/*' ) {
open ( my $input_fh, "<", $file ) or die $!;
open ( my $output_fh, ">", "$file.NEW" ) or die $!;
my $replace = shift ( #arrayF );
while ( my $line = <$input_fh> ) {
$line =~ s/;size=\d+/$replace/g;
print {$output_fh} $line;
}
close ( $input_fh );
close ( $output_fh );
#rename 'output'.
}

Related

Iterate through a file multiple times, each time finding a regex and returning one line (perl)

I have one file with ~90k lines of text in 4 columns.
col1 col2 col3 value1
...
col1 col2 col3 value90000
A second file contains ~200 lines, each one corresponding to a value from column 4 of the larger file.
value1
value2
...
value200
I want to read in each value from the smaller file, find the corresponding line in the larger file, and return that line. I have written a perl script that places all the values from the small file into an array, then iterates through that array using each value as a regex to search through the larger file. After some debugging, I feel like I have it almost working, but my script only returns the line corresponding to the LAST element of the array.
Here is the code I have:
open my $fh1, '<', $file1 or die "Could not open $file1: $!";
my #array = <$fh1>;
close $fh1;
my $count = 0;
while ($count < scalar #array) {
my $value = $array[$count];
open my $fh2, '<', $file2 or die "Could not open $file2: $!";
while (<$fh2>) {
if ($_ =~ /$value/) {
my $line = $_;
print $line;
}
}
close $fh2;
$count++;
}
This returns only:
col1 col2 col3 value200
I can get it to print each value of the array, so I know it's iterating through properly, but it's not using each value to search the larger file as I intended. I can also plug any of the values from the array into the $value variable and return the appropriate line, so I know the lines are there. I suspect my bug may have to do with either:
newlines in the array elements, since all the elements have a newline except the last one. I've tried chomp but get the same result.
or
something to do with the way I'm handling the second file with opening/closing. I've tried moving or removing the close command and that either breaks the code or doesn't help.
You should only be reading the 90k line file once, and checking each value from the other file against the fourth column of each line as you do, instead of reading the whole large file once per line of the smaller one:
#!usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
my ($file1, $file2) = #ARGV;
# Read the file of strings to match against
open my $fh1, '<', $file1 or die "Could not open $file1: $!";
my %words = map { chomp; $_ => 1 } <$fh1>;
close $fh1;
# Process the data file in one pass
open my $fh2, '<', $file2 or die "Could not open $file2: $!";
while (my $line = <$fh2>) {
chomp $line;
# Only look at the fourth column
my #fields = split /\s+/, $line, 4;
say $line if exists $words{$fields[3]};
}
close $fh2;
Note this uses a straight up string comparison (Via hash key lookup) against the last column instead of regular expression matching - your sample data looks like that's all that's needed. If you're using actual regular expressions, let me know and I'll update the answer.
Your code does look like it should work, just horribly inefficiently. In fact, after adjusting your sample data so that more than one line matches, it does print out multiple lines for me.
Slightly different approach to the problem
use warnings;
use strict;
use feature 'say';
my $values = shift;
open my $fh1, '<', $values or die "Could not open $values";
my #lookup = <$fh1>;
close $fh1;
chomp #lookup;
my $re = join '|', map { '\b'.$_.'\b' } #lookup;
((split)[3]) =~ /$re/ && print while <>;
Run as script.pl value_file data_file

Perl combine multiple file contents to single file

I have multiple log files say file1.log file2.log file3.log etc. I want to combine these files contents and put it into single file called result_file.log
Is there any Perl module which can achieve this?
Update: Here is my code
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
my #files;
my $dir = "/path/to/directory";
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .log
next unless ($file =~ m/\.log$/);
print "$file\n";
push( #files, $file);
}
closedir(DIR);
print Dumper(\#files);
open my $out_file, ">result_file.log" ;
copy($_, $out_file) foreach ( #files );
exit 0;
Do you think it is feasible solution?
CPAN 'File::Copy' should do the work, you will have to open the output file youself.
use File::Copy ;
open my $out, ">result.log" ;
copy($_, $out) foreach ('file1.log', 'file2.log', );
close $out ;
Update 1:
Based on additional information posted to answer, looks like the ask is to concatenate (in Perl) list of files match a pattern (*.log). Below extends the above solution to include additional logic, using glob, avoiding the readdir and filtering.
use File::Copy ;
open my $out, ">result.log" ;
copy($_, $out) foreach glob('/path/to/dir/*.log' );
close $out ;
Important notes:
* Using glob will SORT the file name alphabetically, while readdir does NOT guarantee any order.
* The output file 'result.log' match '*.log', should not execute the code in the current directory.
Do you think it is feasible solution?
I'm afraid not. Your code is the equivalent of typing these commands at your prompt:
$ cp file1.log result_file.log
$ cp file2.log result_file.log
$ cp file3.log result_file.log
$ ... etc ...
The problem with this is that it copies each file, in turn over the top of the previous one. So you end up with a copy of the final file in the list.
As I said in a comment, this is most easily done using cay - no need for Perl at all.
$ cat file1.log file2.log file3.log > result_file.log
If you really want to do it in Perl, then something like this would work (the first section is rather similar to yours).
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #files;
my $dir = "/path/to/directory";
opendir(my $dh, $dir) or die $!;
while (my $file = readdir($dh)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .log
next unless ($file =~ m/\.log$/);
print "$file\n";
push( #files, "$dir/$file");
}
closedir($dh);
print Dumper(\#files);
open my $out_file, '>', 'result_file.log';
foreach my $fn (#files) {
open my $in_file, '<', $fn or die "$fn: $!";
print $out_file while <$fn>);
}

How to load a CSV file into a perl hash and access each element

I have a CSV file with the following information seperated by commas ...
Owner,Running,Passing,Failing,Model
D42,21,54,543,Yes
T43,54,76,75,No
Y65,76,43,765,Yes
I want to open this CSV file and place its containments inside of a perl hash in my program. I am also interested in the code needed to print a specific element inside of the has. For example, how I will print the "Passing" count for the "Owner" Y65.
The code I currently have:
$file = "path/to/file";
open $f, '<', $files, or die "cant open $file"
while (my $line = <$f>) {
#inside here I am trying to take the containments of this file and place it into a hash. I have tried numerous ways of trying this but none have seemed to work. I am leaving this blank because I do not want to bog down the visibility of my code for those who are kind enough to help and take a look. Thanks.
}
AS well as placing the csv file inside of a hash I also need to understand the syntax to print and navigate through specific elements. Thank you very much in advance.
Here is an example of how to put the data into a hash %owners and later (after having read the file) extract a "passing count" for a particular owner. I am using the Text::CSV module to parse the lines of the file.
use feature qw(say);
use open qw(:std :utf8); # Assume UTF-8 files and terminal output
use strict;
use warnings qw(FATAL utf8);
use Text::CSV;
my $csv = Text::CSV->new ( )
or die "Cannot use CSV: " . Text::CSV->error_diag ();
my $fn = 'test.csv';
open my $fh, "<", $fn
or die "Could not open file '$fn': $!";
my %owners;
my $header = $csv->getline( $fh ); # TODO: add error checking
while ( my $row = $csv->getline( $fh ) ) {
next if #$row == 0; # TODO: more error checking
my ($owner, #values) = #$row;
$owners{$owner} = \#values;
}
close $fh;
my $key = 'Y65';
my $index = 1;
say "Passing count for $key = ", $owners{$key}->[$index];
Since it's not really clear what "load a CSV file into a perl hash" means (Nor does it really make sense. An array of hashes, one per row, maybe, if you don't care about keeping the ordering of fields, but just a hash? What are the keys supposed to be?), let's focus on the rest of your question, in particular
how I will print the "Passing" count for the "Owner" Y65.
There are a few other CSV modules that might be of interest that are much easier to use than Text::CSV:
Tie::CSV_File lets you access a CSV file like a 2D array. $foo[0][0] is the first field of the first row of the tied file.
So:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use Tie::CSV_File;
my $csv = "data.csv";
tie my #data, "Tie::CSV_File", $csv or die "Unable to tie $csv!";
for my $row (#data) {
say $row->[2] and last if $row->[0] eq "Y65";
}
DBD::CSV lets you treat a CSV file like a table in a database you can run SQL queries on.
So:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use DBI;
my $csv = "data.csv";
my $dbh = DBI->connect("dbi:CSV:", undef, undef,
{ csv_tables => { data => { f_file => $csv } } })
or die $DBI::errstr;
my $owner = "Y65";
my $p = $dbh->selectrow_arrayref("SELECT Passing FROM data WHERE Owner = ?",
{}, $owner);
say $p->[0] if defined $p;
Text::AutoCSV has a bunch of handy functions for working with CSV files.
So:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use Text::AutoCSV;
my $csv = "data.csv";
my $acsv = Text::AutoCSV->new(in_file => $csv) or die "Unable to open $csv!";
my $row = $acsv->search_1hr("OWNER", "Y65");
say $row->{"PASSING"} if defined $row;
This last one is probably closest to what I think you think you want.

Perl, matching files of a directory, using an array with part of the these file names

So, I have this directory with files named like this:
HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
HG00119.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam.bai
NA20828.mapped.illumina.mosaik.TSI.exome.20110411.bam_herc2_phase1.bam
NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20130415.bam_herc2_data.bam
And I have a input.txt file that contains in each line.
NA20828
HG00119
As you can see, the input.txt file has the beginning of the name of the files inside the directory.
What I want to do is to filter the files in the directory that have the name (in this case just the beginning), inside the input.txt.
I don't know if I was clear, but here is the code I've done so far.
use strict;
use warnings;
my #lines;
my #files = glob("*.mapped*");
open (my $input,'<','input.txt') or die $!;
while (my $line = <$input>) {
push (#lines, $line);
}
close $input;
I used the glob to filter only the files with mapped in the name, since I have other files there that I don't want to look for.
I tried some foreach loops, tried grep and regex also, and I'm pretty sure that I was going in the right way, and I think my mistake might be about scope.
I would appreciate any help guys! thanks!
OK, first off - your while loop is redundant. If you read from a filehandle in a list context, it reads the whole thing.
my #lines = <$input>;
will do the same as your while loop.
Now, for your patterns - you're matching one list against another list, but partial matches.
chomp ( #lines );
foreach my $file ( #files ) {
foreach my $line ( #lines ) {
if ( $file =~ m/$line/ ) { print "$file matches $line"; }
}
}
(And yes, something like grep or map can do this, but I always find those two make my head hurt - they're neater, but they're implicitly looping so you don't really gain much algorithmic efficiency).
You can build a regular expression from the contents of input.txt like this
my #lines = do {
open my $fh, '<', 'input.txt' or die $!;
<$fh>;
};
chomp #lines;
my $re = join '|', #lines;
and then find the required files using
my #files = grep /^(?:$re)/, glob '*.mapped*';
Note that, if the list in input.txt contains any regex metacharacters, such as ., *, + etc. you will need to escape them, probably using quotemeta like this
my $re = join '|', map quotemeta, #lines;
and it may be best to do this anyway unless you are certain that there will never ever be such characters in the file.

Comparing two arrays in Perl

I know this has been asked before, and I know there are functions to make this easy in Perl. But what I want is advice on my specific code. I want to go through each line of text which I've read from a file, and compare it to the same line from another file, printing them if they are different.
I've tried as many variations of this as I could think of, and none work. This specific code which I'm posting thinks every element in the array is different from the one in the other array.
use 5.18.2;
use strict;
use utf8;
printf "This program only compares two files.\n"
. "Here are the differences between "
. $ARGV[0] . " and " . $ARGV[1] . ":\n";
open FIRST_FH, '<', $ARGV[0];
chomp(my #file1 = <FIRST_FH>);
close FIRST_FH;
open SECOND_FH, '<', $ARGV[1];
chomp(my #file2 = <SECOND_FH>);
close SECOND_FH;
for(my $i=0; $i < scalar #file1; ++$i){
my $string = $file2[$i];
unless($_ =~ /$string/){
print "Difference found: #file1[$i], #file2[$i]\n";
}
}
use utf8; just instructs the interpreter to read your source file as UTF-8. Use the open pragma to set the default IO layers to UTF-8 (or manually specify '<:encoding(UTF-8)' as the second argument to open).
Don't use printf when print will suffice (it usually does, due to interpolation). In this particular instance, I find a heredoc to be most readable.
It's inefficient to read both files into memory. Iterate over them lazily by taking one line at a time in a while loop.
Always check if open failed and include $! in the error message. Alternatively, use autodie;, which handles this for you. Also, use lexical filehandles; they'll automatically close when they go out of scope, and won't clash with other barewords (e.g. subroutines and built-ins).
Keeping in mind these suggestions, the new code would look like:
#!/usr/bin/perl
use 5.18.2; # Implicitly loads strict
use warnings;
use open qw(:encoding(utf8) :std);
print <<"EOT";
This program only compares 2 files.
Here are the differences between
$ARGV[0] and $ARGV[1]:
EOT
open(my $file1, '<', shift) or die $!;
open(my $file2, '<', shift) or die $!;
while (my $f1_line = <$file1>, my $f2_line = <$file2>)
{
if ($f1_line ne $f2_line)
{
print $f1_line, $f2_line;
}
}
But this is still a naive algorithm; if one file has a line removed, all subsequent lines will differ between files. To properly achieve a diff-like comparison, you'll need an implementation of an algorithm that finds the longest common subsequence. Consider using the CPAN module Algorithm::Diff.
Why are you comparing using $_? Which you haven't defined anywhere?
my $string = $file2[$i];
unless($_ =~ /$string/){
Simply compare the lines using eq or ne:
if ( $file1[$i] ne $file2[$i] ) {
However, I would recommend that you make a lot of stylistic changes to your script, starting with doing line by line processing instead of slurping in the files. The following is how I would completely rewrite it:
use 5.18.2;
use strict;
use warnings;
use autodie;
use utf8;
my ( $file1, $file2 ) = #ARGV;
open my $fh1, '<', $file1;
open my $fh2, '<', $file2;
while ( !eof($fh1) && !eof($fh2) ) {
chomp( my $line1 = <$fh1> );
chomp( my $line2 = <$fh2> );
if ( line1 ne $line2 ) {
warn "Difference found on line $.:\n $line1\n $line2\n";
}
}
warn "Still more data in $file1\n" if !eof $fh1;
warn "Still more data in $file2\n" if !eof $fh2;

Resources