Perl combine multiple file contents to single file - file

I have multiple log files say file1.log file2.log file3.log etc. I want to combine these files contents and put it into single file called result_file.log
Is there any Perl module which can achieve this?
Update: Here is my code
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
my #files;
my $dir = "/path/to/directory";
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .log
next unless ($file =~ m/\.log$/);
print "$file\n";
push( #files, $file);
}
closedir(DIR);
print Dumper(\#files);
open my $out_file, ">result_file.log" ;
copy($_, $out_file) foreach ( #files );
exit 0;
Do you think it is feasible solution?

CPAN 'File::Copy' should do the work, you will have to open the output file youself.
use File::Copy ;
open my $out, ">result.log" ;
copy($_, $out) foreach ('file1.log', 'file2.log', );
close $out ;
Update 1:
Based on additional information posted to answer, looks like the ask is to concatenate (in Perl) list of files match a pattern (*.log). Below extends the above solution to include additional logic, using glob, avoiding the readdir and filtering.
use File::Copy ;
open my $out, ">result.log" ;
copy($_, $out) foreach glob('/path/to/dir/*.log' );
close $out ;
Important notes:
* Using glob will SORT the file name alphabetically, while readdir does NOT guarantee any order.
* The output file 'result.log' match '*.log', should not execute the code in the current directory.

Do you think it is feasible solution?
I'm afraid not. Your code is the equivalent of typing these commands at your prompt:
$ cp file1.log result_file.log
$ cp file2.log result_file.log
$ cp file3.log result_file.log
$ ... etc ...
The problem with this is that it copies each file, in turn over the top of the previous one. So you end up with a copy of the final file in the list.
As I said in a comment, this is most easily done using cay - no need for Perl at all.
$ cat file1.log file2.log file3.log > result_file.log
If you really want to do it in Perl, then something like this would work (the first section is rather similar to yours).
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #files;
my $dir = "/path/to/directory";
opendir(my $dh, $dir) or die $!;
while (my $file = readdir($dh)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .log
next unless ($file =~ m/\.log$/);
print "$file\n";
push( #files, "$dir/$file");
}
closedir($dh);
print Dumper(\#files);
open my $out_file, '>', 'result_file.log';
foreach my $fn (#files) {
open my $in_file, '<', $fn or die "$fn: $!";
print $out_file while <$fn>);
}

Related

Creating array of file names using grep

I'm having difficulty outputting file names as an array using grep. Specifically, I want to create an array of file names (plant photos) formatted like this:
Ilex_verticillata= Ilex_verticillata1.png, Ilex_verticillata2.png, Ilex_verticillata3.png
Asarum_canadense= Asarum_canadense1.png, Asarum_canadense2.png
Ageratina_altissi= Ageratina_altissi1.png, Ageratina_altissi2.png
Here's my original Perl script that I'm attempting to modify. It returns, as intended, ONE file name per plant as "Genus_species", printing a list of those plants:
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir my $dfh, $dir or die "Can't open $dir: $!";
my #files =
map { s/1\.png\z/.png/r } # Removes "1" from end of file names
grep { /^[^2-9]*\.png\z/i && /_/ } # Finds "Genus_species.png" & "Genus_species1.png" and returns one file name per plant as "Genus_species.png"
readdir $dfh;
foreach my$file (#files) {
$file =~s/\.png//; # Removes ".png" extension
print "$file\n"; #Prints list of file names (plant names)
}
Here's the output:
Ilex_verticillata
Asarum_canadense
Ageratina_altissima
However, since each plant often has MULTIPLE photos (e.g.-- "Genus_species1.png, Genus_species2.png, etc.), I need to re-grep the directory using the above output to find their file names, then output the results in the form of an array as previously illustrated.
I know the solution likely involves modifying the "foreach" statement, using grep to return ALL file names with "Genus_species" in their name. Here's what I tried:
foreach my$file (#files) {
$file =~s/\.png//;
grep ($file,readdir(DIR));
print "$file = $file\n";
But the output was this:
Ilex_verticillata = Ilex_verticillata
Asarum_canadense = Asarum_canadense
Ageratina_altissima = Ageratina_altissima
Again, I want to output an array formatted as:
"Genus_species= Genus_species1.png, Genus_species2.png, etc.," meaning I want it to look like this:
Ilex_verticillata= Ilex_verticillata1.png, Ilex_verticillata2.png, Ilex_verticillata3.png
Asarum_canadense= Asarum_canadense1.png, Asarum_canadense2.png
Ageratina_altissi= Ageratina_altissi1.png, Ageratina_altissi2.png
Notice that I also want to add back the ".png" extension ONLY to the file names to the right of the equals sign.
Please advise. Thanks.
Readdir returns a list of files in the folder. You've put them on one line, which is compact. However, if you loop them you can process the items further.
#!/usr/bin/perl
use strict;
use warnings;
use English; ## use names rather than symbols for special varables
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir my $dfh, $dir or die "Can't open $dir: $OS_ERROR";
my %genus_species; ## store matching entries in a hash
for my $file (readdir $dfh)
{
next unless $file =~ /\d\.png$/; ## skip entry if not a png file ending with a number
my $genus = $file =~ s/\d\.png$//r;
push(#{$genus_species{$genus}}, $file); ## push to array,the #{} is to cast the single entry to a referance to an list
}
for my $genus (keys %genus_species)
{
print "$genus = ";
print "$_ " for sort #{$genus_species{$genus}}; # sort and loop though entries in list referance
print "\n";
}

"No Such File Error" when trying to open each fasta file stored in an array

How can I open each file in a folder in sequential order, perform a regex search on the contents of each file, and store the matches in another array?
Here is what I have so far:
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
my $dir = ("/path/to/folder");
my #ArrayofFiles;
my #TrimmedSequences;
opendir( my $dh, $dir ) || die;
#make an array of fasta files from a folder
while ( readdir $dh ) {
chomp;
my $fileName = $_;
if ($fileName =~ /\.fasta.*/) {
push(#ArrayofFiles, $fileName);
}
}
#this diagnostic print statement shows that I do get the proper files into the target array. I leave it commented out when I run the script.
#print join("\n", #ArrayofFiles), "\n";
#now I want to open each file in the array, search file contents, and add the result to another array
foreach my $file (#ArrayofFiles){
open (my $sequence, '<', $file) or die $!;
while (my $line = <$sequence>) {
if ($line =~ m/(CTCCCA)[TAGC]+(TCAGGA)/) {
push(#TrimmedSequences, $line);
}
}
}
When I run this code, I get the following error message:
"Uncaught exception from user code: No such file or directory at /Users/roblogan/Documents/BIOL6309/Manipulating fast5 files/Attempt 5 line 23."
Line 24 is "open (my $sequence, '<', $file) or die $!;"
My diagnostic print statement shows that I am working with an array of the expected fasta files.
I would be very grateful for any help I can get. Thank you so much.
-Rob
#ArrayOfFiles just contains the filenames, it doesn't include the directory prefix. So you're trying to access the filenames in the current directory rather than the directory you listed.
Use:
push(#ArrayofFiles, "$dir/$fileName");
to get the full path.

Perl: Replace strings in multiple files with array entry

I am looking for a simple way to replace strings in multiple text files. In the first file the string should be replaced with the first element of the array #arrayF; in the second file the string must be replaced with the second entry etc.
I want to replace ;size=\d+ where \d+ is a wildcard for any number.
This is what I have so far:
#!/usr/bin/perl -w
use strict;
use warnings;
my $counter = 0;
my #arrayF = '/Users/majuss/Desktop/filelist.txt>'; # Reads all lines into array
my #files = '/Users/majuss/Desktop/New_Folder/*'; #get Files into an array
foreach my $file ( #files ) {
$file =~ s/;size=\d+/$arrayF[$counter]/g; #subst.
print
$counter++; #increment array index
}
It gives a zero back and nothing happens.
I know how to do it in a one-liner but I can't figure a way out how to implement an array there.
Note these points that I commented on beneath your question
The line commented Reads all lines into array doesn't do that. It simply sets #arrayF to a one-element list that holds the string /Users/majuss/Desktop/filelist.txt>. You probably need to open the file and read its contents into an array
The line commented get Files into an array doesn't do that. It simply sets #files to a one-element list that holds the string /Users/majuss/Desktop/New_Folder/*. You probably need to use glob to expand the wildcard into a list of files
The statement
$file =~ s/;size=\d+/$arrayF[$counter]/g
is attempting to modify the variable $file which contains the name of the file. Presumably you meant to edit the contents of that file, so you must open and read it first
Please don't use upper-case letters in your local identifiers
Don't use -w on the shebang line as well as use warnings; just the latter is correct
This seems to do what you're asking for, but be aware that it is untested except that I have checked that it will compile. Be careful that you have a backup of the original files as this code will overwrite the original files with the modified data
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use autodie;
my $replacement_text = '/Users/majuss/Desktop/filelist.txt';
my $file_glob = '/Users/majuss/Desktop/New_Folder/*';
my #replacement_text = do {
open my $fh, '<', $replacement_text;
<$fh>;
};
chomp #replacement_text;
my $i = 0;
for my $file ( glob $file_glob ) {
my $contents = do {
open my $in_fh, '<', $file;
local $/;
<$in_fh>;
};
$contents =~ s/;size=\d+/$replacement_text[$i]/g;
open my $out_fh, '>', $file;
print $out_fh $contents;
++$i;
}
You're not opening filelist.txt and reading it.
Do do this you need to:
open ( my $input, "<", '/Users/majuss/Desktop/filelist.txt' ) or die $!;
my #arrayF = <$input>;
close ( $input );
You need to use glob to search a directory pattern like that.
Like this:
foreach my $file ( glob ( '/Users/majuss/Desktop/New_Folder/*' ) {
# stuff
}
To search and replace within a file, it's actually a bit different to a one liner. You can look at 'in place editing' in perlrun - but this is where perl is trying to pretend to be sed. I think you can do it if you try - there's an option in perlvar:
$^I
The current value of the inplace-edit extension. Use undef to disable inplace editing.
Mnemonic: value of -i switch.
This answer may give some insight:
In-place editing of multiple files in a directory using Perl's diamond and in-place edit operator
Instead you can:
foreach my $file ( glob ( '/Users/majuss/Desktop/New_Folder/*' ) {
open ( my $input_fh, "<", $file ) or die $!;
open ( my $output_fh, ">", "$file.NEW" ) or die $!;
my $replace = shift ( #arrayF );
while ( my $line = <$input_fh> ) {
$line =~ s/;size=\d+/$replace/g;
print {$output_fh} $line;
}
close ( $input_fh );
close ( $output_fh );
#rename 'output'.
}

Perl, matching files of a directory, using an array with part of the these file names

So, I have this directory with files named like this:
HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
HG00119.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam.bai
NA20828.mapped.illumina.mosaik.TSI.exome.20110411.bam_herc2_phase1.bam
NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20130415.bam_herc2_data.bam
And I have a input.txt file that contains in each line.
NA20828
HG00119
As you can see, the input.txt file has the beginning of the name of the files inside the directory.
What I want to do is to filter the files in the directory that have the name (in this case just the beginning), inside the input.txt.
I don't know if I was clear, but here is the code I've done so far.
use strict;
use warnings;
my #lines;
my #files = glob("*.mapped*");
open (my $input,'<','input.txt') or die $!;
while (my $line = <$input>) {
push (#lines, $line);
}
close $input;
I used the glob to filter only the files with mapped in the name, since I have other files there that I don't want to look for.
I tried some foreach loops, tried grep and regex also, and I'm pretty sure that I was going in the right way, and I think my mistake might be about scope.
I would appreciate any help guys! thanks!
OK, first off - your while loop is redundant. If you read from a filehandle in a list context, it reads the whole thing.
my #lines = <$input>;
will do the same as your while loop.
Now, for your patterns - you're matching one list against another list, but partial matches.
chomp ( #lines );
foreach my $file ( #files ) {
foreach my $line ( #lines ) {
if ( $file =~ m/$line/ ) { print "$file matches $line"; }
}
}
(And yes, something like grep or map can do this, but I always find those two make my head hurt - they're neater, but they're implicitly looping so you don't really gain much algorithmic efficiency).
You can build a regular expression from the contents of input.txt like this
my #lines = do {
open my $fh, '<', 'input.txt' or die $!;
<$fh>;
};
chomp #lines;
my $re = join '|', #lines;
and then find the required files using
my #files = grep /^(?:$re)/, glob '*.mapped*';
Note that, if the list in input.txt contains any regex metacharacters, such as ., *, + etc. you will need to escape them, probably using quotemeta like this
my $re = join '|', map quotemeta, #lines;
and it may be best to do this anyway unless you are certain that there will never ever be such characters in the file.

I need help around file access and modification in Perl

I have the folder "segmentation" where i need the use of ".purseg" files(x.purseg,y.purseg,z.purseg). They are kind of text files.
Their form is:
'0.1 4.5 speech_L1'
'4.7 9.2 speech_L2'
etc.
I also have the folder audio where i have the "audio": x.wav,y.wav,z.wav.
Each ".purseg" file matches a ".wav" file,they both have the same name.
For my script i have to get the information from the ".purseg" file and based on it i have to cut from the wav file the part that i need(get the speaker mentioned as speech_L2).I made a script that works if i have both ".purseg" and ".wav" file in the same folder but because i am working with a lot of data i need to fix my script in order to work with folders. Here is the script:
#! /usr/bin/perl -w
use List::MoreUtils qw(uniq);
use File::Path qw(make_path);
use File::Copy "cp";
use warnings;
my $directory = '/home/taurete/Desktop/diar_fem_fin/segmentation/';
opendir (DIR, $directory) or die $!;
while (my $file = readdir(DIR))
{
next unless ($file =~ m/\.purseg$/);
$file =~ s{\.[^.]+$}{};
push (#list1, $file);
# print "$file\n";
}
my $list=#list1;
# print "$list";
$i=0;
while ($i<$list)
{
my $nume1=$list[$i];
open my $fh, "$nume1.purseg" or die $!;
my #file_array;
while (my $line = <$fh>)
{
chomp $line;
my #line_array = split(/\s+/, $line);
push (#file_array, \#line_array);
}
my #arr=#file_array;
$cont1=0;
my $s1= #arr;
for (my $i=0;$i < $s1;$i++)
{
$directory="$nume1";
make_path($directory);
if ("speech_L2" eq "$arr[$i][2]")
{
my $directory = '/home/taurete/Desktop/data/audio/';
opendir (DIR, $directory) or die $!;
$interval = $arr[$i][1] - $arr[$i][0];
$speakername=$nume1._.$cont1;
`sox $nume1.wav ./$directory/$speakername.wav trim $arr[$i][0] $interval`;
$cont1++;
}
}
$i++;
}
Here is what i get:
Name "main::list" used only once: possible typo at ./spkfinal.pl line
23. Use of uninitialized value $nume1 in concatenation (.) or string at ./spkfinal.pl line 27. No such file or directory at ./spkfinal.pl
line 27.
To answer your question about Name "main::list" used only once: possible typo at ./spkfinal.pl line 23., change:
my $nume1=$list[$i];
to:
my $nume1=$list1[$i];
You do not have an array #list, but you do have an array #list1.
I think that will clear up your subsequent warnings, too.

Resources