Perl - read text file line by line into array - arrays

I have little experience about perl, trying to read simple text file line by line and put all objects in to array. Could you please help ?
Text File:
AAA
BBB
CCC
DDD
EEE
Needs to have access for each object in array by index to get access for DDD element for example.
THX

Try this:
use strict;
use warnings;
my $file = "fileName";
open (my $FH, '<', $file) or die "Can't open '$file' for read: $!";
my #lines;
while (my $line = <$FH>) {
push (#lines, $line);
}
close $FH or die "Cannot close $file: $!";
print #lines;
To access an array in Perl, use [] and $. The index of the first element of an array is 0. Therefore,
$lines[3] # contains DDD
see more here : http://perl101.org/arrays.html

open(my $fh, '<', $qfn)
or die("Can't open $qfn: $!\n");
my #a = <$fh>;
chomp #a;
As for the last paragraph, I don't know if you mean
$a[3]
or
my #matching_indexes = grep { $_ eq 'DDD' } 0..$#a;

Your "requirements" here seem extremely minimal and I agree with #ikegami it's hard to tell if you want to match against text in the array or print out an element by index. Perhaps if you read through perlintro you can add to your question and ask for more advanced help based on code you might try writing yourself.
Here is a command line that does what your question originally asked. If you run perldoc perlrun on your system it will show you the various command line switches you can use for perl one-liners.
-0400 - slurps input
-a - autosplits the file into an array called #F using a single white space as the split character.
-n - run in an implicit while (<>) { } loop that reads input from the file on
-E - execute the script between ' '
So with lines.txt equal to your text file above:
`perl -0400 -a -n -E 'say $F[3];' lines.txt`
outputs: DDD

Related

Perl combine multiple file contents to single file

I have multiple log files say file1.log file2.log file3.log etc. I want to combine these files contents and put it into single file called result_file.log
Is there any Perl module which can achieve this?
Update: Here is my code
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
my #files;
my $dir = "/path/to/directory";
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .log
next unless ($file =~ m/\.log$/);
print "$file\n";
push( #files, $file);
}
closedir(DIR);
print Dumper(\#files);
open my $out_file, ">result_file.log" ;
copy($_, $out_file) foreach ( #files );
exit 0;
Do you think it is feasible solution?
CPAN 'File::Copy' should do the work, you will have to open the output file youself.
use File::Copy ;
open my $out, ">result.log" ;
copy($_, $out) foreach ('file1.log', 'file2.log', );
close $out ;
Update 1:
Based on additional information posted to answer, looks like the ask is to concatenate (in Perl) list of files match a pattern (*.log). Below extends the above solution to include additional logic, using glob, avoiding the readdir and filtering.
use File::Copy ;
open my $out, ">result.log" ;
copy($_, $out) foreach glob('/path/to/dir/*.log' );
close $out ;
Important notes:
* Using glob will SORT the file name alphabetically, while readdir does NOT guarantee any order.
* The output file 'result.log' match '*.log', should not execute the code in the current directory.
Do you think it is feasible solution?
I'm afraid not. Your code is the equivalent of typing these commands at your prompt:
$ cp file1.log result_file.log
$ cp file2.log result_file.log
$ cp file3.log result_file.log
$ ... etc ...
The problem with this is that it copies each file, in turn over the top of the previous one. So you end up with a copy of the final file in the list.
As I said in a comment, this is most easily done using cay - no need for Perl at all.
$ cat file1.log file2.log file3.log > result_file.log
If you really want to do it in Perl, then something like this would work (the first section is rather similar to yours).
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #files;
my $dir = "/path/to/directory";
opendir(my $dh, $dir) or die $!;
while (my $file = readdir($dh)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .log
next unless ($file =~ m/\.log$/);
print "$file\n";
push( #files, "$dir/$file");
}
closedir($dh);
print Dumper(\#files);
open my $out_file, '>', 'result_file.log';
foreach my $fn (#files) {
open my $in_file, '<', $fn or die "$fn: $!";
print $out_file while <$fn>);
}

How to read array line in sequence from Input

I have some trouble
with array, how to read an array line in sequence from Input.
Here's my code.
sites.txt: (input file)
site1
site2
site3
program
#!/usr/bin/perl
my $file = 'sites.txt';
open(my $fh, '<:encoding(UTF-8)', $file)
or die "Couldn't open file !'$file' $!";
my #rows = <$fh>;
chomp #rows;
foreach my $site (#rows) {
$sitename = $site;
#domains = qw(.com .net .org);
foreach $dns (#domains){
$domain = $dns;
print "$site$dns\n";
}
}
and the output is like this
site1.com
site1.net
site1.org
site2.com
site2.net
site2.org
site3.com
site3.net
site3.org
I understand until that point,but i want make it the first element of array from #domains
reading until the end of Input line first,
then looping back to the 1st line of Input and going to the next element of array so
the output would be like this,
site1.com
site2.com
site3.com
site1.net
site2.net
site3.net
site1.org
site2.org
site3.org
it possible doing it? or need another module.sorry for basic question
I'll be really appreciated for the Answers.
thanks :)
You are iterating over all your sites and then (for each site) add the domain to the current site.
In pseudocode this is:
foreach site
foreach domain
print site + domain
Swap your loops so that the logic is
foreach domain
foreach site
print site + domain
Note that this is pseudocode, not Perl.
In "real" Perl this would look like:
#!/usr/bin/env perl
use strict;
use warnings;
my $file = 'sites.txt';
open( my $fh, '<:encoding(UTF-8)', $file )
or die "Couldn't open file !'$file' $!";
my #rows = <$fh>;
chomp #rows;
my #domains = qw(.com .net .org);
foreach my $dns (#domains) {
foreach my $site (#rows) {
print "$site$dns\n";
}
}
Output
site1.com
site2.com
site3.com
site1.net
site2.net
site3.net
site1.org
site2.org
site3.org
Please always include use strict; and use warnings; on top of your scripts. These two statements will show you the most common errors in your code.

confusing filehandle in perl

Have been playing with the following script but still couldn't understand the meaning behind the two different "kinds" of filehandle forms. Any insight will be hugely appreciated.
#! usr/bin/perl
use warnings;
use strict;
open (FH, "example.txt") or die $!;
while (<FH>) {
my #line = split (/\t/, $_); {
print "#line","\n";
}
}
The output is as expected: #line array contains elements from line 1,2,3 ... from example.txt. As I was told that open (FH, example.txt) is not as good as open (my $fh, '<', 'example.txt'), I changed it but then confusion arose.
From what I found, $fh is scalar and contains ALL info in example.txt. When I assigned an array to $fh, the array stored each line in example.txt as a component in the array. However, when I tried to further split the component into "more components", I got the error/warning message "use of uninitialized value". Below is the actual script that shows the error/warning message.
open (my $fh, '<', 'example.txt') or die $!;
foreach ($fh) {
my #line = <$fh>;
my $count = 0;
for $count (0..$#line) {
my #line2 = split /\t/, $line[$count];
print "#line2";
print "$line2[0]";
}
}
print "#line2" shows the expected output but print "$line2[0]" invokes the error/warning message. I thought if #line2 is a true array, $line2[0] should be okay. But why "uninitialized value" ??
Any help will be appreciated. Thank you very much.
Added -
the following is the "actual" script (I re-ran it and the warning was there)
#! usr/bin/perl
use warnings;
use strict;
open (my $fh, '<', 'example.txt') or die $!;
foreach ($fh) {
my #line = <$fh>;
print "$line[1]";
my $count = 0;
for my $count (0..$#line) {
my #line2 = split /\t/, $line[$count];
print "#line2";
#my $line2_count = $#line2;
#print $line2_count;
print "$line2[3]";
}
}
The warning is still use of uninitialized value $line2[3] in string at filename.pl line 15, <$fh> line3.
In your second example, you are reading the filehandle in a list context, which I think is the root of your problem.
my $line = <$fh>;
Reads one line from the filehandle.
my #lines = <$fh>;
Reads all the file.
Your former example, thanks to the
while (<FH>) {
Is effectively doing the first case.
But in the second example, you are doing the second thing.
AFAIK, you should always use
while (<FH>) {
# use $_ to access the content
}
or better
while(my $single_line = <FH>) {
# use $single_line to access the content
}
because while reads line by line where for first loads all in memory and iterates it after.
Even the returns undef on EOF or error, the check for undef is added by the interpreter when not explicitly done.
So with while you can load multi gigabyte log files without any issue and without wasting RAM where you can't with for loops that require arrays to be iterated.
At least this is how I remember it from a Perl book that I read some years ago.

Perl, matching files of a directory, using an array with part of the these file names

So, I have this directory with files named like this:
HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
HG00119.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam.bai
NA20828.mapped.illumina.mosaik.TSI.exome.20110411.bam_herc2_phase1.bam
NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20130415.bam_herc2_data.bam
And I have a input.txt file that contains in each line.
NA20828
HG00119
As you can see, the input.txt file has the beginning of the name of the files inside the directory.
What I want to do is to filter the files in the directory that have the name (in this case just the beginning), inside the input.txt.
I don't know if I was clear, but here is the code I've done so far.
use strict;
use warnings;
my #lines;
my #files = glob("*.mapped*");
open (my $input,'<','input.txt') or die $!;
while (my $line = <$input>) {
push (#lines, $line);
}
close $input;
I used the glob to filter only the files with mapped in the name, since I have other files there that I don't want to look for.
I tried some foreach loops, tried grep and regex also, and I'm pretty sure that I was going in the right way, and I think my mistake might be about scope.
I would appreciate any help guys! thanks!
OK, first off - your while loop is redundant. If you read from a filehandle in a list context, it reads the whole thing.
my #lines = <$input>;
will do the same as your while loop.
Now, for your patterns - you're matching one list against another list, but partial matches.
chomp ( #lines );
foreach my $file ( #files ) {
foreach my $line ( #lines ) {
if ( $file =~ m/$line/ ) { print "$file matches $line"; }
}
}
(And yes, something like grep or map can do this, but I always find those two make my head hurt - they're neater, but they're implicitly looping so you don't really gain much algorithmic efficiency).
You can build a regular expression from the contents of input.txt like this
my #lines = do {
open my $fh, '<', 'input.txt' or die $!;
<$fh>;
};
chomp #lines;
my $re = join '|', #lines;
and then find the required files using
my #files = grep /^(?:$re)/, glob '*.mapped*';
Note that, if the list in input.txt contains any regex metacharacters, such as ., *, + etc. you will need to escape them, probably using quotemeta like this
my $re = join '|', map quotemeta, #lines;
and it may be best to do this anyway unless you are certain that there will never ever be such characters in the file.

capture column and print to file in perl

I have an array that is basically a list of group IDs. I'll use the array to put the ID in a proprietary linux command using a foreach loop, but also use the array elements to name the files as well (each ID needs its output to its own seperate file). Im having some issues, opening the file and either using AWK to find and print the columns OR I have also tried the split command which I cannot get working either. Here's a sample of what I have so far:
#!/usr/bin/perl -w
# Title: groupmem_pull.pl
# Pupose: Pull group membership from group IDs in array
use strict;
use warnings;
my $gpath = "/home/user/output";
my #grouparray = (
"219",
"226",
"345",
"12",
);
print "Checking Groups:\n";
foreach (#grouparray)
{
print `sudo st-admin show-group-config --group $_ | egrep '(Group ID|Group Name)'`;
print `sudo st-admin show-members --group $_ > "$gpath/$_.txt"`;
#print `cat $gpath/$_`;
#print `cat $gpath/$_ | awk -F"|" '{print $2}' >`;
open (FILE, "$gpath/$_.txt") || die "Can't open file\n: $!";
while(my $groupid = <FILE>) {
`awk -F"|" '{print $2}' "$gpath/$_.txt" > "$gpath/$_.txt"`;
#print `cat "$gpath/$_.txt" | awk -F"|" '{print $2}' > $_.txt`;
}
Right now its erring on the AWK piece saying that "Use of uninitialized value $2 in concatenation (.) or string at ./groupmem_pull.pl line 57, line 2." The output from the first commands basically puts every group ID pull in a text file seperated with pipes. Im having a hell of a time with this one and Im not able to get some of the samples Ive found on stackoverflow to work. Your help is appreciated!
I think AntonH is right about the error message. However, I also think it's possible the result of the program is not what you expect. I also agree that maybe a "pure Perl" solution might work even better if you eliminate the AWK component.
If I understand you correctly, you want to run this command for each group in #grouparray.
sudo st-admin show-members --group <group id>
From there, you read the second column, delimited by the pipe character, and output all values in that column to a file named <group>.txt in the $gpath folder.
If that's the case, I think something like this would work.
use strict;
use warnings;
my $gpath = "/home/user/output";
my #grouparray = qw(219 226 345 12);
print "Checking Groups:\n";
foreach (#grouparray)
{
open my $FILE, '-|', qq{sudo st-admin show-members --group $_} or die $!;
open my $OUT, '>', "$gpath/$_.txt" or die $!;
while (<$FILE>) {
# chomp; # if there are only two fields
my ($field) = (split /\|/, $_, 3)[1];
print $OUT $field, "\n";
}
close $OUT;
close $FILE;
}
I would think that escaping the dollars in the string when it's not refering to a Perl variable would solve the problem (in your case, the $2 becomes \$2).
`awk -F"|" '{print \$2}' "$gpath/$_.txt" > "$gpath/$_.txt"`;
Hambone's code worked for me. AntonH also had a great suggestion which seemed to resolve the errors, but it caused the outputs to be blank. The best way was to simplify my original code by implimenting some of Hambone's suggestion about how to call a function that would pull out the column i needed via Split instead of AWK.
Thank you!

Resources