I need help around file access and modification in Perl - arrays

I have the folder "segmentation" where i need the use of ".purseg" files(x.purseg,y.purseg,z.purseg). They are kind of text files.
Their form is:
'0.1 4.5 speech_L1'
'4.7 9.2 speech_L2'
etc.
I also have the folder audio where i have the "audio": x.wav,y.wav,z.wav.
Each ".purseg" file matches a ".wav" file,they both have the same name.
For my script i have to get the information from the ".purseg" file and based on it i have to cut from the wav file the part that i need(get the speaker mentioned as speech_L2).I made a script that works if i have both ".purseg" and ".wav" file in the same folder but because i am working with a lot of data i need to fix my script in order to work with folders. Here is the script:
#! /usr/bin/perl -w
use List::MoreUtils qw(uniq);
use File::Path qw(make_path);
use File::Copy "cp";
use warnings;
my $directory = '/home/taurete/Desktop/diar_fem_fin/segmentation/';
opendir (DIR, $directory) or die $!;
while (my $file = readdir(DIR))
{
next unless ($file =~ m/\.purseg$/);
$file =~ s{\.[^.]+$}{};
push (#list1, $file);
# print "$file\n";
}
my $list=#list1;
# print "$list";
$i=0;
while ($i<$list)
{
my $nume1=$list[$i];
open my $fh, "$nume1.purseg" or die $!;
my #file_array;
while (my $line = <$fh>)
{
chomp $line;
my #line_array = split(/\s+/, $line);
push (#file_array, \#line_array);
}
my #arr=#file_array;
$cont1=0;
my $s1= #arr;
for (my $i=0;$i < $s1;$i++)
{
$directory="$nume1";
make_path($directory);
if ("speech_L2" eq "$arr[$i][2]")
{
my $directory = '/home/taurete/Desktop/data/audio/';
opendir (DIR, $directory) or die $!;
$interval = $arr[$i][1] - $arr[$i][0];
$speakername=$nume1._.$cont1;
`sox $nume1.wav ./$directory/$speakername.wav trim $arr[$i][0] $interval`;
$cont1++;
}
}
$i++;
}
Here is what i get:
Name "main::list" used only once: possible typo at ./spkfinal.pl line
23. Use of uninitialized value $nume1 in concatenation (.) or string at ./spkfinal.pl line 27. No such file or directory at ./spkfinal.pl
line 27.

To answer your question about Name "main::list" used only once: possible typo at ./spkfinal.pl line 23., change:
my $nume1=$list[$i];
to:
my $nume1=$list1[$i];
You do not have an array #list, but you do have an array #list1.
I think that will clear up your subsequent warnings, too.

Related

Perl combine multiple file contents to single file

I have multiple log files say file1.log file2.log file3.log etc. I want to combine these files contents and put it into single file called result_file.log
Is there any Perl module which can achieve this?
Update: Here is my code
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
my #files;
my $dir = "/path/to/directory";
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .log
next unless ($file =~ m/\.log$/);
print "$file\n";
push( #files, $file);
}
closedir(DIR);
print Dumper(\#files);
open my $out_file, ">result_file.log" ;
copy($_, $out_file) foreach ( #files );
exit 0;
Do you think it is feasible solution?
CPAN 'File::Copy' should do the work, you will have to open the output file youself.
use File::Copy ;
open my $out, ">result.log" ;
copy($_, $out) foreach ('file1.log', 'file2.log', );
close $out ;
Update 1:
Based on additional information posted to answer, looks like the ask is to concatenate (in Perl) list of files match a pattern (*.log). Below extends the above solution to include additional logic, using glob, avoiding the readdir and filtering.
use File::Copy ;
open my $out, ">result.log" ;
copy($_, $out) foreach glob('/path/to/dir/*.log' );
close $out ;
Important notes:
* Using glob will SORT the file name alphabetically, while readdir does NOT guarantee any order.
* The output file 'result.log' match '*.log', should not execute the code in the current directory.
Do you think it is feasible solution?
I'm afraid not. Your code is the equivalent of typing these commands at your prompt:
$ cp file1.log result_file.log
$ cp file2.log result_file.log
$ cp file3.log result_file.log
$ ... etc ...
The problem with this is that it copies each file, in turn over the top of the previous one. So you end up with a copy of the final file in the list.
As I said in a comment, this is most easily done using cay - no need for Perl at all.
$ cat file1.log file2.log file3.log > result_file.log
If you really want to do it in Perl, then something like this would work (the first section is rather similar to yours).
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #files;
my $dir = "/path/to/directory";
opendir(my $dh, $dir) or die $!;
while (my $file = readdir($dh)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .log
next unless ($file =~ m/\.log$/);
print "$file\n";
push( #files, "$dir/$file");
}
closedir($dh);
print Dumper(\#files);
open my $out_file, '>', 'result_file.log';
foreach my $fn (#files) {
open my $in_file, '<', $fn or die "$fn: $!";
print $out_file while <$fn>);
}

array elements gets deleted when looping files

I have a problem with looping through file names, my input array elements gets deleted.
CODE:
use Data::Dumper;
use warnings;
use strict;
my #files = ("file1", "file2", "file3");
print Dumper(\#files);
for (#files) {
my $filename = $_ . '.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while(<$fh>) {
print "$filename read line \n";
}
}
print Dumper(\#files);
OUTPUT:
$VAR1 = [
'file1',
'file2',
'file3'
];
file1.txt read line
file2.txt read line
file3.txt read line
$VAR1 = [
undef,
undef,
undef
];
FILE CONTENTS:
cat file1.txt
asdfsdfs
cat file2.txt
iasdfasdsf
cat file3.txt
sadflkjasdlfj
Why does the array contents get deleted?
(I have 2 different workarounds for the problem, but I would like to understand what's the problem with this code.)
while (<$fh>)
is short for
while ($_ = <$fh>)
so you are clobbering $_ which is aliased to an element of #files. You need to protect $_ as follows:
while (local $_ = <$fh>)
Better yet, use a different variable name.
while (my $line = <$fh>)
You're using $_ in two different ways inside of the loop (as the current filename and as the current line) and they're clobbering each other. Don't do this. Name your variables, e.g.:
for my $file (#files) {
...
while(my $line = <$fh>) {
...
}
}
You can imagine that your current code does this after reading each file:
for (#files) {
undef $_;
}

Save all Data in array, Filter out duplicated Data, Compare Data between arrays and Removed the matched Data

I have some problems regarding my script.
The problems are:
The value of $str or #matchedPath sometimes blank when I print out. It is not random, it happen only to certain Path in the table.txt file, which I can't figure it out, why?
How to print like the outcome, because I can't find the correct file location or directory of table.txt file because I have put all the path location in an array, filtered it and compared with the matched correct file location of table.txt, because of this, some location is missing when printed out.
Example path that the /home/is/latest/table.txt files contain, the bold texts is the wanted path in table.txt,
##WHAT PATH IS_THAT,Backup
a b/c/d B
a b/c/d/e B
a b/c/d/e/f B
a b/c/d/g B
Example path that the /home/are/latest/table.txt files contain, the middle texts is the wanted path in table.txt,
##WHAT PATH IS_THAT,Backup
a b/c/d/j B
e.g. list.txt file contains,
rty/b
uio/b/c
qwe/b/c/d
asd/b/c/d/e
zxc/b/c/d/e/f
vbn/c/d/e
fgh/j/k/l
Expected outcome:
Unmatched Path : b/c/d/g
table.txt file location: /home/is/latest/table.txt
Unmatched Path : b/c/d/j
table.txt file location: /home/are/latest/table.txt
Below is my detailed script,
#!/usr/perl/5.14.1/bin/perl
# I want to make a script that automatically compare the path in table.txt with list.txt
#table.txt files is located under a parent directory and it differs in the subdirectory.
#There is about 10 table.txt files and each one of it need to compare with list.txt
#The objective is to print out the path that are not in the list.txt
use strict;
use warnings;
use Switch;
use Getopt::Std;
use Getopt::Long;
use Term::ANSIColor qw(:constants);
use File::Find::Rule;
use File::Find;
use File::Copy;
use Cwd;
use Term::ANSIColor;
my $path1='/home'; #Automatically search all table.txt file in this directory even in subdirectory
my $version='latest'; #search the file specified subdirectory e.g. /home/is/latest/table.txt and /home/are/latest/table.txt
my $path2='/list.text'; #there is about 10 table.txt files which contain specified paths in it.
$path1 =~ s/^\s+|\s+$//g;
$version =~ s/^\s+|\s+$//g;
$path2 =~ s/^\s+|\s+$//g;
my #files = File::Find::Rule->file()
->name( 'table.txt' )
->in( "$path1" );
my #symlink_dirs = File::Find::Rule->directory->symlink->in($path1); #If the directory is a symlink, in my case 'latest' is a symlink directory
print colored (sprintf ("\n\n\tSUMMARY REPORT"),'bold','magenta');
print "\n\n_______________________________________________________________________________________________________________________________________________________\n\n";
if ($version eq "latest")
{
foreach my $dir (#symlink_dirs)
{
my #filess = File::Find::Rule->file()
->name( 'table.txt' )
->in( "$path1" );
my $symDir=($dir."/"."table.txt");
$symDir =~ s/^\s+|\s+$//g;
my $wantedPath=$symDir;
my $path_1 = $wantedPath;
function($path_1);
}
}
else
{
for my $file (#files)
{
if ($file =~ m/.*$version.*/)
{
my $wantedPath=$file;
my $path_1 = $wantedPath;
function($path_1);
}
}
}
sub function
{
my $path_1 = $_[0];
open DATA, '<', $path_1 or die "Could not open $path_1: $!";
my $path_2 = "$path2";
open DATA1, '<', $path_2 or die "Could not open $path_2: $!";
################# FOCUSED PROBLEM AREA ##############################
my #matchedPath;
my #matched_File_Path;
my #unmatchedPath;
my #unmatched_File_Path;
my #s2 = <DATA1>;
while(<DATA>)
{
my $s1 = $_;
if ($s1 =~ /^#.*/)
{
next;
}
if ($s1 =~ /(.*)\s+(.*)\s+(.*)\s+/)
{
my $str=($2);
$str =~ s/\s+//g;
for my $s2 (#s2)
{
if ($s2 =~ /.*$str/)
{
push #matchedPath,$str;
push #matched_File_Path,$path_1;
print "matched Path: $str\n\t$path_1\n"; #I don't understand, sometimes I get empty $str value in this. Can anyone help me?
last;
}
else
{
#print "unmatch:$str\n\t$path_1\n";
push #unmatchedPath,$str;
#unmatched_File_Path,$path_1;
}
}
}
}
foreach (#unmatchedPath)
{print "unmatch path: $_\n";}
foreach (#matchedPath)
{print "\nmatch path: $_\n\n";}
foreach (#unmatched_File_Path)
{print "unmatch File Path: $_\n";}
foreach (#matched_File_Path)
{print "match File Path: $_\n";}
my #filteredUnmatchedPath = uniq(#unmatchedPath);
my #filteredUnmatched_IP_File_Path =uniq(#unmatched_IP_File_Path);
#filteredUnmatchedPath = grep {my $filteredPath = $_; not grep $_ eq $filteredPath, #matchedPath} #filteredUnmatchedPath;
}
print "#filteredUnmatchedPath\n";
print "#filteredUnmatched_IP_File_Path\n";
sub uniq
{
my %seen;
grep !$seen{$_}++, #_;
}
close(DATA);
close(DATA1);
print "_________________________________________________________________________________________________________________________________________________________\n\n";
I think using hashes is much simpler here
here's what I tried:
you will have to replace #all_path with your array containing every path where table is present
use strict;
use warnings;
my #all_path =("some/location/table.txt","some/location_2/table.txt");
my %table_paths;
my %list_paths;
foreach my $path (#all_path)
{
open (my $table, "<", $path) or die ("error opening file");
#we create hash, each key is a path
while (<$table>)
{
chomp;
#only process lines starting with "a" as it seems to be the format of this file
$table_paths{(split)[1]}=$path if (/^a/); #taking the 2nd element in each line
}
close $table;
}
open (my $list, "<", "list.txt") or die ("error opening file");
#we create hash, each key is a path
while (<$list>)
{
chomp;
$list_paths{$_}=1;
}
close $list;
#now we delete from table_paths common keys with list, that lefts unmathed
foreach my $key (keys %table_paths)
{
delete $table_paths{$key} if (grep {$_ =~ /$key$/} (keys %list_paths));
}
#printing unmatched keys
print "unmatched :$_\nlocation: $table_paths{$_}\n\n" foreach keys %table_paths;
inputs
in some/location/table.txt
##WHAT PATH IS_THAT,Backup
a b/c/d B
a b/c/d/e B
a b/c/d/e/f B
a b/c/d/g B
in some/location_2/table.txt
##WHAT PATH IS_THAT,Backup
a b/c/d/j B
in list.txt
rty/b
uio/b/c
qwe/dummyName/b/c/d
asd/b/c/d/e
zxc/b/c/d/e/f
vbn/c/d/e
fgh/j/k/l
output:
unmatched: b/c/d/g
location: some/location/table.txt
unmatched: b/c/d/j
location: some/location_2/table.txt

How to copy a list of files in perl from one directory to another

In UNIX get the files from DIRECTORY_PATH based on file format and last modified date and move the file to ARCHIVE_DIRECTORY
For that used the below piece of code
DIRECTORY_PATH=/apps/data/central_archive/
NO_OF_DAYS_ARCHIVE=10
FILE_FORMAT=txt
EXEC_CMD=mv
ARCHIVE_DIRECTORY=/apps/data/archive/
res=`find $DIRECTORY_PATH -mtime $NO_OF_DAYS_ARCHIVE -name "*$FILE_FORMAT*" -type f|grep "$DIRECTORY_PATH[^/]*$" | grep -v '/rf/'`
Here how do implement the same logic in Perl?
One way, using File::Find::Rule:
use warnings;
use strict;
use File::Copy;
use File::Find::Rule;
my $dir = 'src/';
my $arch_dir = 'archive/';
my $days = 10;
my $type = '*.txt';
if (! -d $arch_dir){
mkdir $arch_dir or die $!;
}
my #files = File::Find::Rule->file()
->name($type)
->mtime('> ' . (time() - $days*24*60*60))
->in($dir);
for (#files){
move $_, $arch_dir or die $!;
print "moved $_ to $arch_dir/\n";
}

"No Such File Error" when trying to open each fasta file stored in an array

How can I open each file in a folder in sequential order, perform a regex search on the contents of each file, and store the matches in another array?
Here is what I have so far:
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
my $dir = ("/path/to/folder");
my #ArrayofFiles;
my #TrimmedSequences;
opendir( my $dh, $dir ) || die;
#make an array of fasta files from a folder
while ( readdir $dh ) {
chomp;
my $fileName = $_;
if ($fileName =~ /\.fasta.*/) {
push(#ArrayofFiles, $fileName);
}
}
#this diagnostic print statement shows that I do get the proper files into the target array. I leave it commented out when I run the script.
#print join("\n", #ArrayofFiles), "\n";
#now I want to open each file in the array, search file contents, and add the result to another array
foreach my $file (#ArrayofFiles){
open (my $sequence, '<', $file) or die $!;
while (my $line = <$sequence>) {
if ($line =~ m/(CTCCCA)[TAGC]+(TCAGGA)/) {
push(#TrimmedSequences, $line);
}
}
}
When I run this code, I get the following error message:
"Uncaught exception from user code: No such file or directory at /Users/roblogan/Documents/BIOL6309/Manipulating fast5 files/Attempt 5 line 23."
Line 24 is "open (my $sequence, '<', $file) or die $!;"
My diagnostic print statement shows that I am working with an array of the expected fasta files.
I would be very grateful for any help I can get. Thank you so much.
-Rob
#ArrayOfFiles just contains the filenames, it doesn't include the directory prefix. So you're trying to access the filenames in the current directory rather than the directory you listed.
Use:
push(#ArrayofFiles, "$dir/$fileName");
to get the full path.

Resources