im using a perl script to convert from JSON to csv. the sampple json is like this,
[{"id":100,
"primary-codes":["E0181V00","E0226V00"],
"substitute-codes":["E0181D00","E0226100"],
"fk-id":2294}]
and the perl code i used to convert this to csv is,
#!/usr/bin/perl
use utf8;
use warnings;
use strict;
use lib '.';
use JSON::PP qw(decode_json);
my $json;
{
local $/;
open my $fh, '<', 'output_array.json' or die $!;
$json = <$fh>;
}
my $perl = decode_json $json;
my $filename = 'sample.csv';
open(my $fh, '>>:encoding(UTF-8)', $filename) or die "Could not open file '$filename' $!";
say $fh 'nk_id,prim_cd,sub_cd,fk_id';
for (#$perl){
my $nk_id = '"' . $_->{"id"} . '"';
my $prim_cd= '"' . $_->{"primary-codes"} . '"';
my $sub_cd= '"' . $_->{"substitute-codes"} . '"';
my $fk_id= '"' . $_->{"fk-id"} . '"';
say $fh "$nk_id," . "$prim_cd," . "$sub_cd," . "$fk_id";
}
close $fh;
The output i get is like this,
nk_id,prim_cd,sub_cd,fk_id
100,ARRAY(0x201549f8),ARRAY(0x20154a88),2294
but i want it to be,
100,"E0181V00,E0226V00","E0181D00,E0226100",2294
i tried using ,
my $prim_cd = '"' . join ",", #{ $perl->[0]{"primary-codes"} } . '"';
But it just returns the count and not the elements.
Please help me out with this issue.
Thanks!
The one detail that subverted your honest attempt is the precedence. In
my $prim_cd = '"' . join ",", #{ $perl->[0]{"primary-codes"} } . '"';
the list #{ $perl->[0]...} is first concatenated (.) with ", and as the . operator imposes scalar context the number of elements in the list is used, yielding 2". Only then does join do its thing, "joining" the given list (which is a single element, 2")
Just add parenthesis
for (#$perl){
my $nk_id = '"' . $_->{"id"} . '"';
my $prim_cd = '"' . join(',', #{$_->{'primary-codes'}}) . '"';
my $sub_cd = '"' . join(',', #{$_->{'substitute-codes'}}) . '"';
my $fk_id = '"' . $_->{"fk-id"} . '"';
say $fh "$nk_id,$prim_cd,$sub_cd,$fk_id";
}
A few notes.
You forgot use feature qw(say); without which say won't work.
What is shown has no use of use utf8;, which is about the source file itself
What is shown doesn't need use lib; that specifies paths to be searched for modules. (It won't help in finding files, if that was the idea.). However, once we are at it ...
I avoid using . as it can create problems ranging from confusion to direct errors. For one, do you mean the current working directory, or the directory of the script? They aren't the same. Assuming that you use it for the script's directory (which it isn't), replace it with
use FindBin qw($RealBin);
use lib $RealBin;
To slurp a file you can also do
my $json = do {
local $/;
open my $fh, '<', 'output_array.json' or die $!;
<$fh>;
};
Related
How can I open each file in a folder in sequential order, perform a regex search on the contents of each file, and store the matches in another array?
Here is what I have so far:
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
my $dir = ("/path/to/folder");
my #ArrayofFiles;
my #TrimmedSequences;
opendir( my $dh, $dir ) || die;
#make an array of fasta files from a folder
while ( readdir $dh ) {
chomp;
my $fileName = $_;
if ($fileName =~ /\.fasta.*/) {
push(#ArrayofFiles, $fileName);
}
}
#this diagnostic print statement shows that I do get the proper files into the target array. I leave it commented out when I run the script.
#print join("\n", #ArrayofFiles), "\n";
#now I want to open each file in the array, search file contents, and add the result to another array
foreach my $file (#ArrayofFiles){
open (my $sequence, '<', $file) or die $!;
while (my $line = <$sequence>) {
if ($line =~ m/(CTCCCA)[TAGC]+(TCAGGA)/) {
push(#TrimmedSequences, $line);
}
}
}
When I run this code, I get the following error message:
"Uncaught exception from user code: No such file or directory at /Users/roblogan/Documents/BIOL6309/Manipulating fast5 files/Attempt 5 line 23."
Line 24 is "open (my $sequence, '<', $file) or die $!;"
My diagnostic print statement shows that I am working with an array of the expected fasta files.
I would be very grateful for any help I can get. Thank you so much.
-Rob
#ArrayOfFiles just contains the filenames, it doesn't include the directory prefix. So you're trying to access the filenames in the current directory rather than the directory you listed.
Use:
push(#ArrayofFiles, "$dir/$fileName");
to get the full path.
I am stuck on how to loop a list over another list. Perhaps I am not searching for the right words but I am stuck and would appreciate any help on my code.
I went over this thread but am still getting errors running my script. How do I search a Perl array for a matching string?
the database file
chr1 1692239 1692249 AH_GARP2_comp198_c0_seq1
chr1 2233934 2233944 CS_GARP2_comp323_c0_seq1
chr1 5993434 5993444 CS_GARP2_comp635_c0_seq1
chr1 6198157 6198167 CS_GARP2_comp115_c0_seq1
chr1 6465781 6465791 JB_GARP2_comp560_c0_seq1
chr1 7827923 7827933 JB_GARP2_comp855_c0_seq1
chr1 7920939 7920949 AA_GARP2_comp614_c0_seq1
chr1 7964000 7964010 CS_GARP2_comp717_c0_seq1
chr1 9314857 9314867 AH_GARP2_comp237_c0_seq1
chr1 9654532 9654542 AH_GARP2_comp632_c0_seq1
the query file
name1 CS_GARP2_comp635_c0_seq1
name2 JB_GARP2_comp855_c0_seq1
name3 AH_GARP2_comp198_c0_seq1
name4 AH_GARP2_comp237_c0_seq1
My code
#!/usr/bin/perl5.16.2
use 5.16.2;
use lib '/users/ec1/perl5/lib/perl5/';
use warnings;
use strict;
my $filename = shift; ## database
my $filename2= shift; ## list of ids
open (DB, '<', $filename ) || die "Unable to open: $!";
open (I , '<', $filename2) || die "Unable to open: $!";
my #DB;
while (<DB>) {
chomp;
my #DB = split /\t/, $_; ## define as list
#print "#DB[0,1,2,3]\n";
}
while (my $line = <I>) {
chomp $line;
my ($name, $id) = split /\t/, $line;
if ($DB[3] =~ /$id/) {print "$DB[0]\t$DB[1]\t$DB[2]\t$DB[3]\n";
} else {print "na\n"}
}
Put DB into hash (%DB) to ease searches.
use strict; use warnings;
my $filename = shift; ## database
my $filename2= shift; ## list of ids
my %DB;
open (DB, '<', $filename ) || die "Unable to open: $!";
while (<DB>) {
chomp;
my #row = split( /[ \t]+/, $_); ## define as list
die "expected four items in db file - line $.\n" unless #row == 4; # expect four elements in a row
die "duplicate id in db file - line $." if exists $DB{$row[3]};
$DB{$row[3]} = \#row;
}
close DB;
open (I , '<', $filename2) || die "Unable to open: $!";
while (<I>) {
chomp;
my ($name, $id) = split /[ \t]+/, $_;
if(exists ($DB{$id})) {
my #row = #{$DB{$id}};
print join("\t", #row), "\n";
} else {
print "na\n"
}
}
close(I);
P.S. I have changed split pattern to ease tests of copy&paste data file content
The problem with your approach is that the my #DB inside while loop creates a lexical scope so it will always contain the last line's contents and that content will not be available outside the loop.
You should read the id file in to a hash map and check if the line for db file exists as a key in hash.
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
my ($dbfile, $idfile) = #ARGV;
open my $id_fh, '<', $idfile;
open my $db_fh, '<', $dbfile;
my %ids;
while (<$id_fh>) {
++$ids{$_} for (split /\s+/)[1]; # split and put column2 as key in %ids.
}
while (<$db_fh>) {
my $fld = (split /\s+/)[3]; # split and assign column4 to $fld
print "na\n" and next unless $ids{$fld}; # print "na" if fld is not in hash and move to next line
print "$_"; # print the line if column4 exits.
}
I have the folder "segmentation" where i need the use of ".purseg" files(x.purseg,y.purseg,z.purseg). They are kind of text files.
Their form is:
'0.1 4.5 speech_L1'
'4.7 9.2 speech_L2'
etc.
I also have the folder audio where i have the "audio": x.wav,y.wav,z.wav.
Each ".purseg" file matches a ".wav" file,they both have the same name.
For my script i have to get the information from the ".purseg" file and based on it i have to cut from the wav file the part that i need(get the speaker mentioned as speech_L2).I made a script that works if i have both ".purseg" and ".wav" file in the same folder but because i am working with a lot of data i need to fix my script in order to work with folders. Here is the script:
#! /usr/bin/perl -w
use List::MoreUtils qw(uniq);
use File::Path qw(make_path);
use File::Copy "cp";
use warnings;
my $directory = '/home/taurete/Desktop/diar_fem_fin/segmentation/';
opendir (DIR, $directory) or die $!;
while (my $file = readdir(DIR))
{
next unless ($file =~ m/\.purseg$/);
$file =~ s{\.[^.]+$}{};
push (#list1, $file);
# print "$file\n";
}
my $list=#list1;
# print "$list";
$i=0;
while ($i<$list)
{
my $nume1=$list[$i];
open my $fh, "$nume1.purseg" or die $!;
my #file_array;
while (my $line = <$fh>)
{
chomp $line;
my #line_array = split(/\s+/, $line);
push (#file_array, \#line_array);
}
my #arr=#file_array;
$cont1=0;
my $s1= #arr;
for (my $i=0;$i < $s1;$i++)
{
$directory="$nume1";
make_path($directory);
if ("speech_L2" eq "$arr[$i][2]")
{
my $directory = '/home/taurete/Desktop/data/audio/';
opendir (DIR, $directory) or die $!;
$interval = $arr[$i][1] - $arr[$i][0];
$speakername=$nume1._.$cont1;
`sox $nume1.wav ./$directory/$speakername.wav trim $arr[$i][0] $interval`;
$cont1++;
}
}
$i++;
}
Here is what i get:
Name "main::list" used only once: possible typo at ./spkfinal.pl line
23. Use of uninitialized value $nume1 in concatenation (.) or string at ./spkfinal.pl line 27. No such file or directory at ./spkfinal.pl
line 27.
To answer your question about Name "main::list" used only once: possible typo at ./spkfinal.pl line 23., change:
my $nume1=$list[$i];
to:
my $nume1=$list1[$i];
You do not have an array #list, but you do have an array #list1.
I think that will clear up your subsequent warnings, too.
I have 5 files in different directory. I am extracting the data's from all files and make it as new file.
Note: input each file as an array and extract the data by using for loop for each n every files. I want to make it as single for loop to take the files and process the rest
For file1 am using
foreach (#file)
{
my #temp = split(/\t/, trim($_));
push(#output, $temp[0] . "\t" . $temp[1] . "\n");
}
foreach(uniq(#output))
{
print $OUTPUTFILE $_;
}
I am doing this for five times to process five file. Can anyone help me on how to make it simple
Just wrap it in an outer loop, iterating over all five files:
for my $file ( #five_files ) {
open my $fh, '<', $file or die "Unable to open $file: $!";
my #file = <$fh>;
foreach (#file) {
my #temp = split(/\t/, trim($_));
push(#output, $temp[0] . "\t" . $temp[1] . "\n");
}
foreach(uniq(#output)) {
print $OUTPUTFILE $_;
}
}
Since you're interested in just the first two elements of #temp, the foreach #file loop can be simplified:
my #temp = split /\t/, trim($_), 2;
push #output, #temp, "\n" ;
What if you simplify things by flattening out your #file array with join.
Then you can just split it up and deal with the list.
Eg:
!/usr/bin/perl
my #file = ("file1\tfile3 ","file1\tfile3\tfile3 ","file2"); # Some test data.
my $in = join "\t", #file; # Make one string.
my #temp = split(" ", $in); # Split it on whitespace.
# Did it work?
foreach(#temp)
{
print "($_)\n"; # use () to see if we have any white spaces.
}
Might be a problem if you have spaces in your filenames though!
I was checking out this link here: How could I write a Perl script to calculate the MD5 sum of every file in a directory?
It gets the md5 of each file in a specified directory. What i want to do is take those md5's and compare them against an array. This is what i have so far.
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
my $dirname = "./";
opendir( DIR, $dirname );
my #files = readdir(DIR);
closedir(DIR);
print "#files\n";
foreach my $file (#files) {
if ( -d $file || !-r $file ) { next; }
open( my $FILE, $file );
binmode($FILE);
print Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
my #array = ('667fc8db8e5519cacbf8f9f2af2e0b08');
if (#array ~~ $FILE) {
print "matches array", "\n";
} else {
print "doesnt match array", "\n";
}
}
system ( 'pause' )
But with this, i always get doesnt match array no matter if it does match the array perfectly. I can print #array and it will even show the same md5 values of the file. But like i said it just always says "doesnt match array". ive never got it to say "matches array" on any file. Thank you for looking :)
EDIT:
This is what i have now.
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
my $dirname = "./";
opendir( DIR, $dirname );
my #files = readdir(DIR);
closedir(DIR);
print "#files\n";
foreach my $file (#files) {
next if -d $file || !-r $file;
open( my $FILE, $file );
binmode($FILE);
#print digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
Sdigest = Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
my #array = ('667fc8db8e5519cacbf8f9f2af2e0b08');
if($digest eq $array[0]) {
print "matches array", "\n";
} else {
print "doesnt match array", "\n";
}
}
system ( 'pause' );
Thanks to all for your help. You guys are awesome ;)
Please do not use smartmatch ~~. It was declared experimental in the latest release of Perl, and the semantics are likely to change in the future.
The best solution is to create a hash of the fingerprints you know:
my %fingerprints;
$fingerprints{"667fc8db8e5519cacbf8f9f2af2e0b08"} = undef;
If you want to load a whole array of fingerprints into the hash so that we can easily test for existence, you can use a hash slice:
#fingerprints{#array} = ();
Next, we store the fingerprint of the current file in a variable:
my $digest = Digest::MD5->new->addfile($FILE)->hexdigest;
Then we test if that $digest exists in the hash of fingerprints:
if (exists $fingerprints{$digest}) {
print "$digest for <$file> -- FOUND\n";
}
else {
print "$digest for <$file>\n";
}
Using a hash is usually faster than looping through an array (If you do multiple lookups).
Suggested complete program:
use strict;
use warnings;
use feature qw< say >;
use autodie; # automatic error handling
use Digest::MD5;
my ($dirname, $fingerprint_file) = #ARGV; # takes two command line arguments
length $dirname or die "First argument must be a directory name\n";
length $fingerprint_file or die "Second argument must be a file with fingerprints\n";
# load the fingerprints
my %fingerprints;
open my $fingerprints_fh, "<", $fingerprint_file;
while (<$fingerprints_fh>) {
chomp;
$fingerprints{$_} = undef;
}
close $fingerprints_fh;
opendir my $directory, $dirname;
while(my $file = readdir $directory) {
next if not -f $file;
open my $fh, "<:raw", "$dirname/$file";
my $digest = Digest::MD5->new->addfile($fh)->hexdigest;
close $fh;
if (exists $fingerprints{$digest}) {
say qq($digest "$file" -- FOUND);
}
else {
say qq($digest "$file");
}
}
closedir $directory;
Example invocation
> perl script.pl . digests.txt
Perhaps the following will be helpful:
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
use File::Basename;
my $dirname = './';
my %MD5s = (
'667fc8db8e5519cacbf8f9f2af2e0b08' => 1,
'8c0452b597bc2c261ded598a65b043b9' => 1
);
for my $file ( grep { !-d and -r } <$dirname*> ) {
open my $FILE, '<', $file or die $!;
binmode $FILE;
my $md5hexdigest = Digest::MD5->new->addfile($FILE)->hexdigest;
close $FILE;
print basename ($file), " md5hexdigest $md5hexdigest ";
if ( $MD5s{$md5hexdigest} ) {
print "matches hash", "\n";
}
else {
print "doesn't match hash", "\n";
}
}
Sample output:
XOR_String_Match.pl md5hexdigest 8c0452b597bc2c261ded598a65b043b9 matches hash
zipped.txt md5hexdigest d41d8cd98f00b204e9800998ecf8427e doesn't match hash
Like this:
my $digest = Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
then
if($digest eq $array[0])
By the way, it would maybe be slightly more idiomatic to say (earlier on in your code):
next if -d $file || !-r $file;