skipping a line in an array, Perl - arrays

I'm relatively new to Perl and I've come across this project that I'm having a bit of a hard time with.
The object of the project is to compare two csv files, one of which would contain:
$name, $model, $version
and the other which would contain:
$name2,$disk,$storage
in the end the RESULT file will contain that matched lines and put together the information like so:
$name, $model, $version, $disk,$storage.
I've managed to do this, but my problem is that when one of the elements in missing the program breaks. When it encounters a line in the file missing an element it stops at that line. How can I fix this problem? any suggestions or a way as to how I can perhaps make it skip that line and continue on?
Here's my code:
open( TESTING, '>testing.csv' ); # Names will be printed to this during testing. only .net ending names should appear
open( MISSING, '>Missing.csv' ); # Lines with missing name feilds will appear here.
#open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
#my (#array) =<FILE>;
my #hostname; #stores names
#close FILE;
#***** TESTING TO SEE IF ANY OF THE LISTED ITEMS BEGIN WITH A COMMA AND DO NOT HAVE A NAME.
#***** THESE OBJECTS ARE PLACED INTO THE MISSING ARRAY AND THEN PRINTED OUT IN A SEPERATE
#***** FILE.
#open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
#test
if ( open( FILE, "file.txt" ) ) {
}
else {
die " Cannot open file 1!\n:$!";
}
$count = 0;
$x = 0;
while (<FILE>) {
( $name, $model, $version ) = split(","); #parsing
#print $name;
chomp( $name, $model, $version );
if ( ( $name =~ /^\s*$/ )
&& ( $model =~ /^\s*$/ )
&& ( $version =~ /^\s*$/ ) ) #if all of the fields are blank ( just a blank space)
{
#do nothing at all
}
elsif ( $name =~ /^\s*$/ ) { #if name is a blank
$name =~ s/^\s*/missing/g;
print MISSING "$name,$model,$version\n";
#$hostname[$count]=$name;
#$count++;
}
elsif ( $model =~ /^\s*$/ ) { #if model is blank
$model =~ s/^\s*/missing/g;
print MISSING"$name,$model,$version\n";
}
elsif ( $version =~ /^\s*$/ ) { #if version is blank
$version =~ s/^\s*/missing/g;
print MISSING "$name,$model,$version\n";
}
# Searches for .net to appear in field "$name" if match, it places it into hostname array.
if ( $name =~ /.net/ ) {
$hostname[$count] = $name;
$count++;
}
#searches for a comma in the name feild, puts that into an array and prints the line into the missing file.
#probably won't have to use this, as I've found a better method to test all of the feilds ( $name,$model,$version)
#and put those into the missing file. Hopefully it works.
#foreach $line (#array)
#{
#if($line =~ /^\,+/)
#{
#$line =~s/^\,*/missing,/g;
#$missing[$x]=$line;
#$x++;
#}
#}
}
close FILE;
for my $hostname (#hostname) {
print TESTING $hostname . "\n";
}
#for my $missing(#missing)
#{
# print MISSING $missing;
#}
if ( open( FILE2, "file2.txt" ) ) { #Run this if the open succeeds
#open outfile and print starting header
open( RESULT, '>resultfile.csv' );
print RESULT ("name,Model,version,Disk, storage\n");
}
else {
die " Cannot open file 2!\n:$!";
}
$count = 0;
while ( $hostname[$count] ne "" ) {
while (<FILE>) {
( $name, $model, $version ) = split(","); #parsing
#print $name,"\n";
if ( $name eq $hostname[$count] ) # I think this is the problem area.
{
print $name, "\n", $hostname[$count], "\n";
#print RESULT"$name,$model,$version,";
#open (FILE2,'C:\Users\hp-laptop\Desktop\file2.txt');
#test
if ( open( FILE2, "file2.txt" ) ) {
}
else {
die " Cannot open file 2!\n:$!";
}
while (<FILE2>) {
chomp;
( $name2, $Dcount, $vname ) = split(","); #parsing
if ( $name eq $name2 ) {
chomp($version);
print RESULT"$name,$model,$version,$Dcount,$vname\n";
}
}
}
$count++;
}
#open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
#test
if ( open( FILE, "file.txt" ) ) {
}
else {
die " Cannot open file 1!\n:$!";
}
}
close FILE;
close RESULT;
close FILE2;

I think you want next, which lets you finish the current iteration immediately and start the next one:
while (<FILE>) {
( $name, $model, $version ) = split(",");
next unless( $name && $model && $version );
...;
}
The condition that you use depends on what values you'll accept. In my examples, I'm assuming that all values need to true. If they need to just not be the empty string, maybe you check the length instead:
while (<FILE>) {
( $name, $model, $version ) = split(",");
next unless( length($name) && length($model) && length($version) );
...;
}
If you know how to validate each field, you might have subroutines for those:
while (<FILE>) {
( $name, $model, $version ) = split(",");
next unless( length($name) && is_valid_model($model) && length($version) );
...;
}
sub is_valid_model { ... }
Now you just need to decide how to integrate that into what you are already doing.

You should start by adding use strict and use warnings to the top of your program, and declaring all variables with my at their point of first use. That will reveal a lot of simple mistakes that are otherwise difficult to spot.
You should also use the three-parameter for of open and lexical filehandles, and the Perl idiom for checking exceptions on opening files is to add or die to an open call. if statements with an empty block for the success path waste space and become unreadable. An open call should look like this
open my $fh, '>', 'myfile' or die "Unable to open file: $!";
Finally, it is much safer to use a Perl module when you are handling CSV files as there are a lot of pitfalls in using a simple split /,/. The Text::CSV module has done all the work for you and is available on CPAN.
You problem is that, having read to the end of the first file, you don't rewind or reopen it before reading from the same handle again in the second nested loop. That means no more data will be read from that file and the program will behave as if it is empty.
It is a bad strategy to read through the same file hundreds of times just to pair up coresponding records. If file is of a reasonable size you should build a data structure in memory to hold the information. A Perl hash is ideal as it allows you to look up the data corresponding to a given name instantly.
I have written a revision of your code that demonstrates these points. It would be awkward for me to test the code as I have no sample data, but if you continue to have problems please let us know.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new;
my %data;
# Read the name, model and version from the first file. Write any records
# that don't have the full three fields to the "MISSING" file
#
open my $f1, '<', 'file.txt' or die qq(Cannot open file 1: $!);
open my $missing, '>', 'Missing.csv'
or die qq(Unable to open "MISSING" file for output: $!);
# Lines with missing name fields will appear here.
while ( my $line = csv->getline($f1) ) {
my $name = $line->[0];
if (grep $_, #$line < 3) {
$csv->print($missing, $line);
}
else {
$data{$name} = $line if $name =~ /\.net$/i;
}
}
close $missing;
# Put a list of .net names found into the testing file
#
open my $testing, '>', 'testing.csv'
or die qq(Unable to open "TESTING" file for output: $!);
# Names will be printed to this during testing. Only ".net" ending names should appear
print $testing "$_\n" for sort keys %data;
close $testing;
# Read the name, disk and storage from the second file and check that the line
# contains all three fields. Remove the name field from the start and append
# to the data record with the matching name if it exists.
#
open my $f2, '<', 'file2.txt' or die qq(Cannot open file 2: $!);
while ( my $line = $csv->getline($f2) ) {
next unless grep $_, #$line >= 3;
my $name = shift #$line;
next unless $name =~ /\.net$/i;
my $record = $data{$name};
push #$record, #$line if $record;
}
# Print the completed hash. Send each record to the result output if it
# has the required five fields
#
open my $result, '>', 'resultfile.csv' or die qq(Cannot open results file: $!);
$csv->print($result, qw( name Model version Disk storage ));
for my $name (sort keys %data) {
my $line = $data{$name};
if (grep $_, #$line >= 5) {
$csv->print($result, $data{$name});
}
}

Related

Reading a line from a file using perl

First off, I have to find the existence of the pass and fail files in the subdirectories. Then, I need to read the first line of the pass/fail file. I thought of separating the $file1 and $file to differentiate it. I'm very new to perl so I know my approach is very bad.
I trying to figure out how to combine my current code to read the files I checked exists.
use strict;
use File::Find 'find';
my $file = 'pass.txt';
my $file1 = 'fail.txt';
my #directory = ('ram1','ram2');
sub check
{
if ( -e $_ && $_ eq $file )
{
print "Found file '$_' in directory '$File::Find::dir'\n";
}
elsif ( -e $_ && $_ eq $file1 )
{
print "Found file '$_' in directory '$File::Find::dir'\n";
}
}
find (\&check,#directory);
Is it possible I use the code below for the first if condition? I know it doesn't work but I'm not sure what to do next as the fail and pass text are inside the directories.
if (open my $File::Find::dir, '<', $file){
my $firstLine = <$File::Find::dir>;
close $firstLine;
Any suggestions would be helpful!
If you just want to look just in ram1 and ram2, there's no point in using File::Find. That is used for recursive searches, meaning if you want to search all the subdirectories of ram1 and ram2. (And for that, I'd use File::Find::Rule over File::Find; it's much cleaner.)
my #dir_qfns = ( 'ram1', 'ram2' );
for my $dir_qfn (#dir_qfns) {
for my $fn ('pass.txt', 'fail.txt') {
my $file_qfn = "$dir_qfn/$fn";
open(my $fh, '<', $file_qfn)
or warn("Can't open \"$file_qfn\": $!\n"), next;
defined( my $first_line = <$fh> )
or warn("\"$file_qfn\" is empty\n"), next;
print("$file_qfn: $first_line");
}
}
If it's ok for a file to be missing, then you can ignore that error (ENOENT).
Similarly, you don't need to output an error message if the file is empty.
my #dir_qfns = ( 'ram1', 'ram2' );
for my $dir_qfn (#dir_qfns) {
for my $fn ('pass.txt', 'fail.txt') {
my $file_qfn = "$dir_qfn/$fn";
my $fh;
if (!open($fh, '<', $file_qfn)) {
warn("Can't open \"$file_qfn\": $!\n") if $!{ENOENT};
next;
}
defined( my $first_line = <$fh> )
or next;
print("$file_qfn: $first_line");
}
}
if (open my $f, '<', 'pass.txt') {
my $firstLine = <$f>;
close $f;
}
OP's code does not make much sense. Perhaps OP is looking for something of next kind
use strict;
use warnings;
use feature 'say';
my $dir = shift || 'some_dir_to_start_from';
my #files = qw/pass.txt fail.txt/;
my $match = join '|', #files;
my $regex = qr/\b($match)\b/;
files_lookup($dir,$regex);
exit 0;
sub files_lookup {
my $dir = shift;
my $re = shift;
for ( glob("$dir/*") ) {
files_lookup($_) if -d;
next unless /$re/;
if( -f ) {
say "File: $_";
open my $fh, '<', $_
or die "Couldn't open $_";
my $line = <$fh>;
say $line;
close $fh;
}
}

Save all Data in array, Filter out duplicated Data, Compare Data between arrays and Removed the matched Data

I have some problems regarding my script.
The problems are:
The value of $str or #matchedPath sometimes blank when I print out. It is not random, it happen only to certain Path in the table.txt file, which I can't figure it out, why?
How to print like the outcome, because I can't find the correct file location or directory of table.txt file because I have put all the path location in an array, filtered it and compared with the matched correct file location of table.txt, because of this, some location is missing when printed out.
Example path that the /home/is/latest/table.txt files contain, the bold texts is the wanted path in table.txt,
##WHAT PATH IS_THAT,Backup
a b/c/d B
a b/c/d/e B
a b/c/d/e/f B
a b/c/d/g B
Example path that the /home/are/latest/table.txt files contain, the middle texts is the wanted path in table.txt,
##WHAT PATH IS_THAT,Backup
a b/c/d/j B
e.g. list.txt file contains,
rty/b
uio/b/c
qwe/b/c/d
asd/b/c/d/e
zxc/b/c/d/e/f
vbn/c/d/e
fgh/j/k/l
Expected outcome:
Unmatched Path : b/c/d/g
table.txt file location: /home/is/latest/table.txt
Unmatched Path : b/c/d/j
table.txt file location: /home/are/latest/table.txt
Below is my detailed script,
#!/usr/perl/5.14.1/bin/perl
# I want to make a script that automatically compare the path in table.txt with list.txt
#table.txt files is located under a parent directory and it differs in the subdirectory.
#There is about 10 table.txt files and each one of it need to compare with list.txt
#The objective is to print out the path that are not in the list.txt
use strict;
use warnings;
use Switch;
use Getopt::Std;
use Getopt::Long;
use Term::ANSIColor qw(:constants);
use File::Find::Rule;
use File::Find;
use File::Copy;
use Cwd;
use Term::ANSIColor;
my $path1='/home'; #Automatically search all table.txt file in this directory even in subdirectory
my $version='latest'; #search the file specified subdirectory e.g. /home/is/latest/table.txt and /home/are/latest/table.txt
my $path2='/list.text'; #there is about 10 table.txt files which contain specified paths in it.
$path1 =~ s/^\s+|\s+$//g;
$version =~ s/^\s+|\s+$//g;
$path2 =~ s/^\s+|\s+$//g;
my #files = File::Find::Rule->file()
->name( 'table.txt' )
->in( "$path1" );
my #symlink_dirs = File::Find::Rule->directory->symlink->in($path1); #If the directory is a symlink, in my case 'latest' is a symlink directory
print colored (sprintf ("\n\n\tSUMMARY REPORT"),'bold','magenta');
print "\n\n_______________________________________________________________________________________________________________________________________________________\n\n";
if ($version eq "latest")
{
foreach my $dir (#symlink_dirs)
{
my #filess = File::Find::Rule->file()
->name( 'table.txt' )
->in( "$path1" );
my $symDir=($dir."/"."table.txt");
$symDir =~ s/^\s+|\s+$//g;
my $wantedPath=$symDir;
my $path_1 = $wantedPath;
function($path_1);
}
}
else
{
for my $file (#files)
{
if ($file =~ m/.*$version.*/)
{
my $wantedPath=$file;
my $path_1 = $wantedPath;
function($path_1);
}
}
}
sub function
{
my $path_1 = $_[0];
open DATA, '<', $path_1 or die "Could not open $path_1: $!";
my $path_2 = "$path2";
open DATA1, '<', $path_2 or die "Could not open $path_2: $!";
################# FOCUSED PROBLEM AREA ##############################
my #matchedPath;
my #matched_File_Path;
my #unmatchedPath;
my #unmatched_File_Path;
my #s2 = <DATA1>;
while(<DATA>)
{
my $s1 = $_;
if ($s1 =~ /^#.*/)
{
next;
}
if ($s1 =~ /(.*)\s+(.*)\s+(.*)\s+/)
{
my $str=($2);
$str =~ s/\s+//g;
for my $s2 (#s2)
{
if ($s2 =~ /.*$str/)
{
push #matchedPath,$str;
push #matched_File_Path,$path_1;
print "matched Path: $str\n\t$path_1\n"; #I don't understand, sometimes I get empty $str value in this. Can anyone help me?
last;
}
else
{
#print "unmatch:$str\n\t$path_1\n";
push #unmatchedPath,$str;
#unmatched_File_Path,$path_1;
}
}
}
}
foreach (#unmatchedPath)
{print "unmatch path: $_\n";}
foreach (#matchedPath)
{print "\nmatch path: $_\n\n";}
foreach (#unmatched_File_Path)
{print "unmatch File Path: $_\n";}
foreach (#matched_File_Path)
{print "match File Path: $_\n";}
my #filteredUnmatchedPath = uniq(#unmatchedPath);
my #filteredUnmatched_IP_File_Path =uniq(#unmatched_IP_File_Path);
#filteredUnmatchedPath = grep {my $filteredPath = $_; not grep $_ eq $filteredPath, #matchedPath} #filteredUnmatchedPath;
}
print "#filteredUnmatchedPath\n";
print "#filteredUnmatched_IP_File_Path\n";
sub uniq
{
my %seen;
grep !$seen{$_}++, #_;
}
close(DATA);
close(DATA1);
print "_________________________________________________________________________________________________________________________________________________________\n\n";
I think using hashes is much simpler here
here's what I tried:
you will have to replace #all_path with your array containing every path where table is present
use strict;
use warnings;
my #all_path =("some/location/table.txt","some/location_2/table.txt");
my %table_paths;
my %list_paths;
foreach my $path (#all_path)
{
open (my $table, "<", $path) or die ("error opening file");
#we create hash, each key is a path
while (<$table>)
{
chomp;
#only process lines starting with "a" as it seems to be the format of this file
$table_paths{(split)[1]}=$path if (/^a/); #taking the 2nd element in each line
}
close $table;
}
open (my $list, "<", "list.txt") or die ("error opening file");
#we create hash, each key is a path
while (<$list>)
{
chomp;
$list_paths{$_}=1;
}
close $list;
#now we delete from table_paths common keys with list, that lefts unmathed
foreach my $key (keys %table_paths)
{
delete $table_paths{$key} if (grep {$_ =~ /$key$/} (keys %list_paths));
}
#printing unmatched keys
print "unmatched :$_\nlocation: $table_paths{$_}\n\n" foreach keys %table_paths;
inputs
in some/location/table.txt
##WHAT PATH IS_THAT,Backup
a b/c/d B
a b/c/d/e B
a b/c/d/e/f B
a b/c/d/g B
in some/location_2/table.txt
##WHAT PATH IS_THAT,Backup
a b/c/d/j B
in list.txt
rty/b
uio/b/c
qwe/dummyName/b/c/d
asd/b/c/d/e
zxc/b/c/d/e/f
vbn/c/d/e
fgh/j/k/l
output:
unmatched: b/c/d/g
location: some/location/table.txt
unmatched: b/c/d/j
location: some/location_2/table.txt

I need to create an out files out of my hash keys and store a file list to the files based on my keys

I have two files the first one has a number range and a version name, the number range is retrieved from the second file which is consist of a list number. From the second file I am looking for the numbers start in position 11 for 9 char then compare it with my first file "the range file" then print to the screen the name of the version and how many matches.
My first file looks like this
imb,folded ,655575645,827544086
imb,selfmail ,827549192,827572977
My second file looks like this
0026110795165557564528452972062
0026110795165557648628452974959
0026110795182749420290503162401
0026110795182749566690703875348
0026110795182750564290503365856
0026110795182751155490713282618
0026110795182751819190503415474
0026110795182752054790503331977
0026110795182752888194578410931
0026110795182753115893308242647
0026110795182753522398248322033
0026110795182753601890723246006
0026110795182754156995403760702
0026110795182754174597213102232
0026110795182754408698248770395
0026110795182754919290713221614
0026110795182755128698248922635
0026110795182755566790713334451
0026110795182755669490713213633
0026110795182755806390507009696
0026110795182756204890713212248
0026110795182756217690713273839
0026110795182756259998248961157
0026110795182756309595403769515
0026110795182756708894578164887
0026110795182756829090713282238
0026110795182757082791367220156
0026110795182757130090713274108
0026110795182757297798248934527
0026110795182757370277063564556
My output now looks like this
folded IMB Count: 15
No Matched IMB Count: 1
selfmail IMB Count: 14
I need to create files with a name based on the version name in my first array, then to print to each files the original value for what it match. For instance folded has 15 match I need to print the original number from the file list to a file with a name of folded.txt
my code is
#!/usr/bin/perl
use warnings
use strict
use feature qw{ say };
sub trimspaces {
my #argsarray = #_;
$argsarray[0] =~ s/^\s+//;
$argsarray[0] =~ s/\s+$//;
return $argsarray[0];
}
open(INPUT , "< D:\\Home\\emahou\\imbfilelist.txt") or die $!;
open(INPUT2 , "< D:\\Home\\emahou\\imbrange.txt") or die $!;
my $n;
my #fh;
my $value;
my #ranges;
my $isMatch;
my $printed;
my $fVersion;
my %versionHash=();
while (<INPUT2>) {
chomp;
my ($version, $from, $to) = (split /,/)[ 1, 2, 3 ];
push #ranges, [ $from, $to, trimspaces($version)];
if (!exists $versionHash{trimspaces($version)})
{
$versionHash{trimspaces($version)}=0;
}
}
$versionHash{"No Matched"}=0;
close INPUT2;
while (<INPUT>) {
$isMatch=0;
$n = substr($_,12-1,9);
for my $r (#ranges) {
if ( $n >= $r->[0] && $n <= $r->[1]) {
$fVersion=$r->[2];
if (exists $versionHash{$fVersion}) {
$versionHash{$fVersion}++;
}
$isMatch=1;
last;
}
}
if (!$isMatch) {
$versionHash{"No Matched"}++;
}
}
foreach my $key (keys %versionHash) {
print STDOUT "$key IMB Count: " . $versionHash{$key} . "\n";
}
close INPUT;
This seems to do as you ask
It works by building a hash %filelist with keys from the second column of imbfilelist.txt and values from, to, fh (the output file handle) and count (the number of records that matched this range
Then the imbrange.txt is read a line at a time, the nine-digit code extracted, and compared with the from and to values of each element of the %filelist hash. If a match is found then the line is printed to the corresponding file handle and the counter is incremented. If the code from this line doesn't match any of the ranges then $none_matched is incremented for output in the summary
use strict;
use warnings;
use 5.010;
use autodie;
chdir 'D:\Home\emahou';
# Build a hash of `version` strings with their `from` and `to` values
open my $fh, '<', 'imbfilelist.txt';
my %filelist;
while ( <$fh> ) {
chomp;
my ($version, $from, $to) = (split /\s*,\s*/)[1,2,3];
$filelist{$version} = { from => $from, to => $to };
}
# Open an output file for each item and set the count to zero
while ( my ($version, $info) = each %filelist ) {
open $info->{fh}, '>', "$version.txt";
$info->{count} = 0;
}
# Filter the data in the range file, printing to the
# appropriate output file and keeping count
open $fh, '<', 'imbrange.txt';
my $none_matched = 0;
while ( my $line = <$fh> ) {
next unless $line =~ /\S/;
chomp $line;
my $code = substr $line, 11, 9;
my $matched = 0;
while ( my ($version, $info) = each %filelist ) {
next unless $code >= $info->{from} and $code <= $info->{to};
print { $info->{fh} } $line, "\n";
++$info->{count};
++$matched;
}
++$none_matched unless $matched;
}
close $_->{fh} for values %filelist;
# Print the summary
while ( my ($version, $info) = each %filelist ) {
print "$version IMB Count: $info->{count}\n"
}
print "None matched IMB Count: $none_matched\n"
output
selfmail IMB Count: 14
folded IMB Count: 15
None matched IMB Count: 1
folded.txt
0026110795165557564528452972062
0026110795165557648628452974959
0026110795182749420290503162401
0026110795182749566690703875348
0026110795182750564290503365856
0026110795182751155490713282618
0026110795182751819190503415474
0026110795182752054790503331977
0026110795182752888194578410931
0026110795182753115893308242647
0026110795182753522398248322033
0026110795182753601890723246006
0026110795182754156995403760702
0026110795182754174597213102232
0026110795182754408698248770395
selfmail.txt
0026110795182754919290713221614
0026110795182755128698248922635
0026110795182755566790713334451
0026110795182755669490713213633
0026110795182755806390507009696
0026110795182756204890713212248
0026110795182756217690713273839
0026110795182756259998248961157
0026110795182756309595403769515
0026110795182756708894578164887
0026110795182756829090713282238
0026110795182757082791367220156
0026110795182757130090713274108
0026110795182757297798248934527

Perl retrieve index of array on regex match and print

I am looking to extract columns based off of header names in a comma (or tab) delimited file. I have a scalar variable that matches many header possibilities I named '$Acct_Name', among ones. I want to read the file(column headers), match it to what I have in '$Acct_Name' and print the matched column along with its data etc.
Here is my code:
open(FILE, "list_2.txt") or die "Cannot open file: $!";
my $Account_Name = qr/^Acct ID$|^Account No$|^Account$|^ACCOUNT NUMBER$|Account Number|Account.*?Number|^Account$|^Account #$|^Account_ID$|^Account ID$/i;
my $CLIENT = qr/^CLIENT_NAME$|^Account Long Name$|^ACCOUNT NAME$|^Account Name$|^Name$|portfolio.*?description|^Account Description$/i;
while (my $line = <FILE>) {
chomp $line;
my #array = split(/,/, $line);
my %index;
#index{#array} = (0..$#array);
my $Account_Name_ = $index{$Account_Name};
if (my ($matched) = grep $array[$_] =~ /$Account_Name/, 0..$#array) {
$Account_Name_ = $matched;
my $CLIENT_ = $index{$CLIENT};
if (my ($matched) = grep $array[$_] =~ /$CLIENT/, 0..$#array) {
$CLIENT_ = $matched;
print $array[$Account_Name_],",",$array[$CLIENT_],"\n";
}
}
}
close(FILE);
Data, list_2.txt
Account number,order_num,Name
dj870-1234,12334566,josh trust 1992
My Results
Account number,Name
Desried Out
Account number,Name
dj870-1234,josh
For some reason I am only able to print the column names based on the match. How can I grab the data as well?
You need to move your print statement to output your data lines - these do not match, so in the original code the print statement is never reached !
use warnings;
open(FILE, "list_2.txt") or die "Cannot open file: $!";
my $Account_Name = qr/^Acct ID$|^Account No$|^Account$|^ACCOUNT NUMBER$|Account Number|Account.*?Number|^Account$|^Account #$|^Account_ID$|^Account ID$/i;
my $CLIENT = qr/^CLIENT_NAME$|^Account Long Name$|^ACCOUNT NAME$|^Account Name$|^Name$|portfolio.*?description|^Account Description$/i;
my ($Account_Name_, $CLIENT_);
while (my $line = <FILE>) {
chomp $line;
my #array = split(/,/, $line);
if (my ($matched) = grep $array[$_] =~ /$Account_Name/, 0..$#array) {
$Account_Name_ = $matched;
if (my ($matched) = grep $array[$_] =~ /$CLIENT/, 0..$#array) {
$CLIENT_ = $matched;
}
}
print $array[$Account_Name_],",",$array[$CLIENT_],"\n";
}
close(FILE);

Compare MD5 from files in a directory against an array (perl)

I was checking out this link here: How could I write a Perl script to calculate the MD5 sum of every file in a directory?
It gets the md5 of each file in a specified directory. What i want to do is take those md5's and compare them against an array. This is what i have so far.
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
my $dirname = "./";
opendir( DIR, $dirname );
my #files = readdir(DIR);
closedir(DIR);
print "#files\n";
foreach my $file (#files) {
if ( -d $file || !-r $file ) { next; }
open( my $FILE, $file );
binmode($FILE);
print Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
my #array = ('667fc8db8e5519cacbf8f9f2af2e0b08');
if (#array ~~ $FILE) {
print "matches array", "\n";
} else {
print "doesnt match array", "\n";
}
}
system ( 'pause' )
But with this, i always get doesnt match array no matter if it does match the array perfectly. I can print #array and it will even show the same md5 values of the file. But like i said it just always says "doesnt match array". ive never got it to say "matches array" on any file. Thank you for looking :)
EDIT:
This is what i have now.
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
my $dirname = "./";
opendir( DIR, $dirname );
my #files = readdir(DIR);
closedir(DIR);
print "#files\n";
foreach my $file (#files) {
next if -d $file || !-r $file;
open( my $FILE, $file );
binmode($FILE);
#print digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
Sdigest = Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
my #array = ('667fc8db8e5519cacbf8f9f2af2e0b08');
if($digest eq $array[0]) {
print "matches array", "\n";
} else {
print "doesnt match array", "\n";
}
}
system ( 'pause' );
Thanks to all for your help. You guys are awesome ;)
Please do not use smartmatch ~~. It was declared experimental in the latest release of Perl, and the semantics are likely to change in the future.
The best solution is to create a hash of the fingerprints you know:
my %fingerprints;
$fingerprints{"667fc8db8e5519cacbf8f9f2af2e0b08"} = undef;
If you want to load a whole array of fingerprints into the hash so that we can easily test for existence, you can use a hash slice:
#fingerprints{#array} = ();
Next, we store the fingerprint of the current file in a variable:
my $digest = Digest::MD5->new->addfile($FILE)->hexdigest;
Then we test if that $digest exists in the hash of fingerprints:
if (exists $fingerprints{$digest}) {
print "$digest for <$file> -- FOUND\n";
}
else {
print "$digest for <$file>\n";
}
Using a hash is usually faster than looping through an array (If you do multiple lookups).
Suggested complete program:
use strict;
use warnings;
use feature qw< say >;
use autodie; # automatic error handling
use Digest::MD5;
my ($dirname, $fingerprint_file) = #ARGV; # takes two command line arguments
length $dirname or die "First argument must be a directory name\n";
length $fingerprint_file or die "Second argument must be a file with fingerprints\n";
# load the fingerprints
my %fingerprints;
open my $fingerprints_fh, "<", $fingerprint_file;
while (<$fingerprints_fh>) {
chomp;
$fingerprints{$_} = undef;
}
close $fingerprints_fh;
opendir my $directory, $dirname;
while(my $file = readdir $directory) {
next if not -f $file;
open my $fh, "<:raw", "$dirname/$file";
my $digest = Digest::MD5->new->addfile($fh)->hexdigest;
close $fh;
if (exists $fingerprints{$digest}) {
say qq($digest "$file" -- FOUND);
}
else {
say qq($digest "$file");
}
}
closedir $directory;
Example invocation
> perl script.pl . digests.txt
Perhaps the following will be helpful:
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
use File::Basename;
my $dirname = './';
my %MD5s = (
'667fc8db8e5519cacbf8f9f2af2e0b08' => 1,
'8c0452b597bc2c261ded598a65b043b9' => 1
);
for my $file ( grep { !-d and -r } <$dirname*> ) {
open my $FILE, '<', $file or die $!;
binmode $FILE;
my $md5hexdigest = Digest::MD5->new->addfile($FILE)->hexdigest;
close $FILE;
print basename ($file), " md5hexdigest $md5hexdigest ";
if ( $MD5s{$md5hexdigest} ) {
print "matches hash", "\n";
}
else {
print "doesn't match hash", "\n";
}
}
Sample output:
XOR_String_Match.pl md5hexdigest 8c0452b597bc2c261ded598a65b043b9 matches hash
zipped.txt md5hexdigest d41d8cd98f00b204e9800998ecf8427e doesn't match hash
Like this:
my $digest = Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
then
if($digest eq $array[0])
By the way, it would maybe be slightly more idiomatic to say (earlier on in your code):
next if -d $file || !-r $file;

Resources