capture column and print to file in perl - arrays

I have an array that is basically a list of group IDs. I'll use the array to put the ID in a proprietary linux command using a foreach loop, but also use the array elements to name the files as well (each ID needs its output to its own seperate file). Im having some issues, opening the file and either using AWK to find and print the columns OR I have also tried the split command which I cannot get working either. Here's a sample of what I have so far:
#!/usr/bin/perl -w
# Title: groupmem_pull.pl
# Pupose: Pull group membership from group IDs in array
use strict;
use warnings;
my $gpath = "/home/user/output";
my #grouparray = (
"219",
"226",
"345",
"12",
);
print "Checking Groups:\n";
foreach (#grouparray)
{
print `sudo st-admin show-group-config --group $_ | egrep '(Group ID|Group Name)'`;
print `sudo st-admin show-members --group $_ > "$gpath/$_.txt"`;
#print `cat $gpath/$_`;
#print `cat $gpath/$_ | awk -F"|" '{print $2}' >`;
open (FILE, "$gpath/$_.txt") || die "Can't open file\n: $!";
while(my $groupid = <FILE>) {
`awk -F"|" '{print $2}' "$gpath/$_.txt" > "$gpath/$_.txt"`;
#print `cat "$gpath/$_.txt" | awk -F"|" '{print $2}' > $_.txt`;
}
Right now its erring on the AWK piece saying that "Use of uninitialized value $2 in concatenation (.) or string at ./groupmem_pull.pl line 57, line 2." The output from the first commands basically puts every group ID pull in a text file seperated with pipes. Im having a hell of a time with this one and Im not able to get some of the samples Ive found on stackoverflow to work. Your help is appreciated!

I think AntonH is right about the error message. However, I also think it's possible the result of the program is not what you expect. I also agree that maybe a "pure Perl" solution might work even better if you eliminate the AWK component.
If I understand you correctly, you want to run this command for each group in #grouparray.
sudo st-admin show-members --group <group id>
From there, you read the second column, delimited by the pipe character, and output all values in that column to a file named <group>.txt in the $gpath folder.
If that's the case, I think something like this would work.
use strict;
use warnings;
my $gpath = "/home/user/output";
my #grouparray = qw(219 226 345 12);
print "Checking Groups:\n";
foreach (#grouparray)
{
open my $FILE, '-|', qq{sudo st-admin show-members --group $_} or die $!;
open my $OUT, '>', "$gpath/$_.txt" or die $!;
while (<$FILE>) {
# chomp; # if there are only two fields
my ($field) = (split /\|/, $_, 3)[1];
print $OUT $field, "\n";
}
close $OUT;
close $FILE;
}

I would think that escaping the dollars in the string when it's not refering to a Perl variable would solve the problem (in your case, the $2 becomes \$2).
`awk -F"|" '{print \$2}' "$gpath/$_.txt" > "$gpath/$_.txt"`;

Hambone's code worked for me. AntonH also had a great suggestion which seemed to resolve the errors, but it caused the outputs to be blank. The best way was to simplify my original code by implimenting some of Hambone's suggestion about how to call a function that would pull out the column i needed via Split instead of AWK.
Thank you!

Related

how to use awk command where the file is perl variable

Unable to run the awk command in perl script and my file is variable.
I have tried in different ways like use system(awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' $download_content) and store the output in array but no luck.
$filter = `awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' $download_content`;
here $download_content is a webpage i.e in html format and i need to extract the search pattern line and its next 6th line.
Here is an example using IPC::Run3:
use IPC::Run3;
# [...]
my #cmd = ('awk', '/"">/{nr[NR]; nr[NR+6]}; NR in nr');
my $in = $download_content;
my $out;
run3 \#cmd, \$in, \$out;
$filter = $out;
Alternatively, you can do it in perl (not calling awk):
my #lines = split /\n/, $download_content;
my %nr;
my $NR = 0;
my $filter = "";
for ( #lines ) {
if ( /"">/ ) {
$nr{$NR}++;
$nr{$NR + 6}++;
}
$filter .= "$_\n" if $nr{$NR};
$NR++;
}
When you write something like:
awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' XXXX
Then awk expects "XXXX" to be the name of a file that it should work on. But (as I understand it) that's not the situation that you have. It sounds to me as though "XXXX" is the actual data that you want to work on. In that case, you need to pipe the data into awk. The easiest option is to use echo:
echo XXXX | awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr'
You should be able to do the same thing using Perl' system() function.
But it might be simpler to reimplement your awk code as Perl.

Perl, matching files of a directory, using an array with part of the these file names

So, I have this directory with files named like this:
HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
HG00119.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam_herc2_data.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam.bai
NA20828.mapped.illumina.mosaik.TSI.exome.20110411.bam_herc2_phase1.bam
NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20130415.bam_herc2_data.bam
And I have a input.txt file that contains in each line.
NA20828
HG00119
As you can see, the input.txt file has the beginning of the name of the files inside the directory.
What I want to do is to filter the files in the directory that have the name (in this case just the beginning), inside the input.txt.
I don't know if I was clear, but here is the code I've done so far.
use strict;
use warnings;
my #lines;
my #files = glob("*.mapped*");
open (my $input,'<','input.txt') or die $!;
while (my $line = <$input>) {
push (#lines, $line);
}
close $input;
I used the glob to filter only the files with mapped in the name, since I have other files there that I don't want to look for.
I tried some foreach loops, tried grep and regex also, and I'm pretty sure that I was going in the right way, and I think my mistake might be about scope.
I would appreciate any help guys! thanks!
OK, first off - your while loop is redundant. If you read from a filehandle in a list context, it reads the whole thing.
my #lines = <$input>;
will do the same as your while loop.
Now, for your patterns - you're matching one list against another list, but partial matches.
chomp ( #lines );
foreach my $file ( #files ) {
foreach my $line ( #lines ) {
if ( $file =~ m/$line/ ) { print "$file matches $line"; }
}
}
(And yes, something like grep or map can do this, but I always find those two make my head hurt - they're neater, but they're implicitly looping so you don't really gain much algorithmic efficiency).
You can build a regular expression from the contents of input.txt like this
my #lines = do {
open my $fh, '<', 'input.txt' or die $!;
<$fh>;
};
chomp #lines;
my $re = join '|', #lines;
and then find the required files using
my #files = grep /^(?:$re)/, glob '*.mapped*';
Note that, if the list in input.txt contains any regex metacharacters, such as ., *, + etc. you will need to escape them, probably using quotemeta like this
my $re = join '|', map quotemeta, #lines;
and it may be best to do this anyway unless you are certain that there will never ever be such characters in the file.

Perl - read text file line by line into array

I have little experience about perl, trying to read simple text file line by line and put all objects in to array. Could you please help ?
Text File:
AAA
BBB
CCC
DDD
EEE
Needs to have access for each object in array by index to get access for DDD element for example.
THX
Try this:
use strict;
use warnings;
my $file = "fileName";
open (my $FH, '<', $file) or die "Can't open '$file' for read: $!";
my #lines;
while (my $line = <$FH>) {
push (#lines, $line);
}
close $FH or die "Cannot close $file: $!";
print #lines;
To access an array in Perl, use [] and $. The index of the first element of an array is 0. Therefore,
$lines[3] # contains DDD
see more here : http://perl101.org/arrays.html
open(my $fh, '<', $qfn)
or die("Can't open $qfn: $!\n");
my #a = <$fh>;
chomp #a;
As for the last paragraph, I don't know if you mean
$a[3]
or
my #matching_indexes = grep { $_ eq 'DDD' } 0..$#a;
Your "requirements" here seem extremely minimal and I agree with #ikegami it's hard to tell if you want to match against text in the array or print out an element by index. Perhaps if you read through perlintro you can add to your question and ask for more advanced help based on code you might try writing yourself.
Here is a command line that does what your question originally asked. If you run perldoc perlrun on your system it will show you the various command line switches you can use for perl one-liners.
-0400 - slurps input
-a - autosplits the file into an array called #F using a single white space as the split character.
-n - run in an implicit while (<>) { } loop that reads input from the file on
-E - execute the script between ' '
So with lines.txt equal to your text file above:
`perl -0400 -a -n -E 'say $F[3];' lines.txt`
outputs: DDD

Perl text file grep

I would like to create an array in Perl of strings that I need to search/grep from a tab-deliminated text file. For example, I create the array:
#!/usr/bin/perl -w
use strict;
use warnings;
# array of search terms
my #searchArray = ('10060\t', '10841\t', '11164\t');
I want to have a foreach loop to grep a text file with a format like this:
c18 10706 463029 K
c2 10841 91075 G
c36 11164 . B
c19 11257 41553 C
for each of the elements of the above array. In the end, I want to have a NEW text file that would look like this (continuing this example):
c2 10841 91075 G
c36 11164 . B
How do I go about doing this? Also, this needs to be able to work on a text file with ~5 million lines, so memory cannot be wasted (I do have 32GB of memory though).
Thanks for any help/advice in advanced! Cheers.
Using a perl one-liner. Just translate your list of numbers into a regex.
perl -ne 'print if /\b(?:10060|10841|11164)\b/' file.txt > newfile.txt
You can search for alternatives by using a regexp like /(10060\t|100841\t|11164\t)/. Since your array could be large, you could create this regexp, by something like
$searchRegex = '(' + join('|',#searchArray) + ')';
this is just a simple string, and so it would be better (faster) to compile it to a regexp by
$searchRegex = qr/$searchRegex/;
With only 5 million lines, you could actually pull the entire file into memory (less than a gigabyte if 100 chars/line), but otherwise, line by line you could search with this pattern as in
while (<>) {
print if $_ =~ $searchRegex
}
So I'm not the best coder but this should work.
#!/usr/bin/perl -w
use strict;
use warnings;
# array of search terms
my $searchfile = 'file.txt';
my $outfile = 'outfile.txt';
my #searchArray = ('10060', '10841', '11164');
my #findArray;
open(READ,'<',$searchfile) || die $!;
while (<READ>)
{
foreach my $searchArray (#searchArray) {
if (/$searchArray/) {
chomp ($_);
push (#findArray, $_) ;
}
}
}
close(READ);
### For Console Print
#foreach (#findArray){
# print $_."\n";
#}
open(WRITE,'>',$outfile) || die $!;
foreach (#findArray){
print WRITE $_."\n";
}
close(WRITE);

How can I extract just the elements I want from a Perl array?

Hey I'm wondering how I can get this code to work. Basically I want to keep the lines of $filename as long as they contain the $user in the path:
open STDERR, ">/dev/null";
$filename=`find -H /home | grep $file`;
#filenames = split(/\n/, $filename);
for $i (#filenames) {
if ($i =~ m/$user/) {
#keep results
} else {
delete $i; # does not work.
}
}
$filename = join ("\n", #filenames);
close STDERR;
I know you can delete like delete $array[index] but I don't have an index with this kind of loop that I know of.
You could replace your loop with:
#filenames = grep /$user/, #filenames;
There's no way to do it when you're using foreach loop. But nevermind. The right thing to do is to use File::Find to accomplish your task.
use File::Find 'find';
...
my #files;
my $wanted = sub {
return unless /\Q$file/ && /\Q$user/;
push #files, $_;
};
find({ wanted => $wanted, no_chdir => 1 }, '/home');
Don't forget to escape your variables with \Q for use in regular expressions.
BTW, redirecting your STDERR to /dev/null is better written as
{
local *STDERR;
open STDERR, '>', '/dev/null';
...
}
It restores the filehandle after exiting the block.
If you have a find that supports -path, then make it do the work for you, e.g.,
#! /usr/bin/perl
use warnings;
use strict;
my $user = "gbacon";
my $file = "bash";
my $path = join " -a " => map "-path '*$_*'", $user, $file;
chomp(my #filenames = `find -H /home $path 2>/dev/null`);
print map "$_\n", #filenames;
Note how backticks in list context give back a list of lines (including their terminators, removed above with chomp) from the command's output. This saves you having to split them yourself.
Output:
/home/gbacon/.bash_history
/home/gbacon/.bashrc
/home/gbacon/.bash_logout
If you want to remove an item from an array, use the multi-talented splice function.
my #foo = qw( a b c d e f );
splice( #foo, 3, 1 ); # Remove element at index 3.
You can do all sorts of other manipulations with splice. See the perldoc for more info.
As codeholic alludes to, you should never modify an array while iterating over it with a for loop. If you want to modify an array while iterating, use a while loop instead.
The reason for this is that for evaluates the expression in parens once, and maps each item in the result list to an alias. If the array changes, the pointers get screwed up and chaos will follow.
A while evaluates the condition each time through the loop, so you won't run into issues with pointers to non-existent values.

Resources