how to use awk command where the file is perl variable - arrays

Unable to run the awk command in perl script and my file is variable.
I have tried in different ways like use system(awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' $download_content) and store the output in array but no luck.
$filter = `awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' $download_content`;
here $download_content is a webpage i.e in html format and i need to extract the search pattern line and its next 6th line.

Here is an example using IPC::Run3:
use IPC::Run3;
# [...]
my #cmd = ('awk', '/"">/{nr[NR]; nr[NR+6]}; NR in nr');
my $in = $download_content;
my $out;
run3 \#cmd, \$in, \$out;
$filter = $out;
Alternatively, you can do it in perl (not calling awk):
my #lines = split /\n/, $download_content;
my %nr;
my $NR = 0;
my $filter = "";
for ( #lines ) {
if ( /"">/ ) {
$nr{$NR}++;
$nr{$NR + 6}++;
}
$filter .= "$_\n" if $nr{$NR};
$NR++;
}

When you write something like:
awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr' XXXX
Then awk expects "XXXX" to be the name of a file that it should work on. But (as I understand it) that's not the situation that you have. It sounds to me as though "XXXX" is the actual data that you want to work on. In that case, you need to pipe the data into awk. The easiest option is to use echo:
echo XXXX | awk '/"">/{nr[NR]; nr[NR+6]}; NR in nr'
You should be able to do the same thing using Perl' system() function.
But it might be simpler to reimplement your awk code as Perl.

Related

capture column and print to file in perl

I have an array that is basically a list of group IDs. I'll use the array to put the ID in a proprietary linux command using a foreach loop, but also use the array elements to name the files as well (each ID needs its output to its own seperate file). Im having some issues, opening the file and either using AWK to find and print the columns OR I have also tried the split command which I cannot get working either. Here's a sample of what I have so far:
#!/usr/bin/perl -w
# Title: groupmem_pull.pl
# Pupose: Pull group membership from group IDs in array
use strict;
use warnings;
my $gpath = "/home/user/output";
my #grouparray = (
"219",
"226",
"345",
"12",
);
print "Checking Groups:\n";
foreach (#grouparray)
{
print `sudo st-admin show-group-config --group $_ | egrep '(Group ID|Group Name)'`;
print `sudo st-admin show-members --group $_ > "$gpath/$_.txt"`;
#print `cat $gpath/$_`;
#print `cat $gpath/$_ | awk -F"|" '{print $2}' >`;
open (FILE, "$gpath/$_.txt") || die "Can't open file\n: $!";
while(my $groupid = <FILE>) {
`awk -F"|" '{print $2}' "$gpath/$_.txt" > "$gpath/$_.txt"`;
#print `cat "$gpath/$_.txt" | awk -F"|" '{print $2}' > $_.txt`;
}
Right now its erring on the AWK piece saying that "Use of uninitialized value $2 in concatenation (.) or string at ./groupmem_pull.pl line 57, line 2." The output from the first commands basically puts every group ID pull in a text file seperated with pipes. Im having a hell of a time with this one and Im not able to get some of the samples Ive found on stackoverflow to work. Your help is appreciated!
I think AntonH is right about the error message. However, I also think it's possible the result of the program is not what you expect. I also agree that maybe a "pure Perl" solution might work even better if you eliminate the AWK component.
If I understand you correctly, you want to run this command for each group in #grouparray.
sudo st-admin show-members --group <group id>
From there, you read the second column, delimited by the pipe character, and output all values in that column to a file named <group>.txt in the $gpath folder.
If that's the case, I think something like this would work.
use strict;
use warnings;
my $gpath = "/home/user/output";
my #grouparray = qw(219 226 345 12);
print "Checking Groups:\n";
foreach (#grouparray)
{
open my $FILE, '-|', qq{sudo st-admin show-members --group $_} or die $!;
open my $OUT, '>', "$gpath/$_.txt" or die $!;
while (<$FILE>) {
# chomp; # if there are only two fields
my ($field) = (split /\|/, $_, 3)[1];
print $OUT $field, "\n";
}
close $OUT;
close $FILE;
}
I would think that escaping the dollars in the string when it's not refering to a Perl variable would solve the problem (in your case, the $2 becomes \$2).
`awk -F"|" '{print \$2}' "$gpath/$_.txt" > "$gpath/$_.txt"`;
Hambone's code worked for me. AntonH also had a great suggestion which seemed to resolve the errors, but it caused the outputs to be blank. The best way was to simplify my original code by implimenting some of Hambone's suggestion about how to call a function that would pull out the column i needed via Split instead of AWK.
Thank you!

Perl text file grep

I would like to create an array in Perl of strings that I need to search/grep from a tab-deliminated text file. For example, I create the array:
#!/usr/bin/perl -w
use strict;
use warnings;
# array of search terms
my #searchArray = ('10060\t', '10841\t', '11164\t');
I want to have a foreach loop to grep a text file with a format like this:
c18 10706 463029 K
c2 10841 91075 G
c36 11164 . B
c19 11257 41553 C
for each of the elements of the above array. In the end, I want to have a NEW text file that would look like this (continuing this example):
c2 10841 91075 G
c36 11164 . B
How do I go about doing this? Also, this needs to be able to work on a text file with ~5 million lines, so memory cannot be wasted (I do have 32GB of memory though).
Thanks for any help/advice in advanced! Cheers.
Using a perl one-liner. Just translate your list of numbers into a regex.
perl -ne 'print if /\b(?:10060|10841|11164)\b/' file.txt > newfile.txt
You can search for alternatives by using a regexp like /(10060\t|100841\t|11164\t)/. Since your array could be large, you could create this regexp, by something like
$searchRegex = '(' + join('|',#searchArray) + ')';
this is just a simple string, and so it would be better (faster) to compile it to a regexp by
$searchRegex = qr/$searchRegex/;
With only 5 million lines, you could actually pull the entire file into memory (less than a gigabyte if 100 chars/line), but otherwise, line by line you could search with this pattern as in
while (<>) {
print if $_ =~ $searchRegex
}
So I'm not the best coder but this should work.
#!/usr/bin/perl -w
use strict;
use warnings;
# array of search terms
my $searchfile = 'file.txt';
my $outfile = 'outfile.txt';
my #searchArray = ('10060', '10841', '11164');
my #findArray;
open(READ,'<',$searchfile) || die $!;
while (<READ>)
{
foreach my $searchArray (#searchArray) {
if (/$searchArray/) {
chomp ($_);
push (#findArray, $_) ;
}
}
}
close(READ);
### For Console Print
#foreach (#findArray){
# print $_."\n";
#}
open(WRITE,'>',$outfile) || die $!;
foreach (#findArray){
print WRITE $_."\n";
}
close(WRITE);

Bash: Split a string into an array

First of all, let me state that I am very new to Bash scripting. I have tried to look for solutions for my problem, but couldn't find any that worked for me.
Let's assume I want to use bash to parse a file that looks like the following:
variable1 = value1
variable2 = value2
I split the file line by line using the following code:
cat /path/to/my.file | while read line; do
echo $line
done
From the $line variable I want to create an array that I want to split using = as a delimiter, so that I will be able to get the variable names and values from the array like so:
$array[0] #variable1
$array[1] #value1
What would be the best way to do this?
Set IFS to '=' in order to split the string on the = sign in your lines, i.e.:
cat file | while IFS='=' read key value; do
${array[0]}="$key"
${array[1]}="$value"
done
You may also be able to use the -a argument to specify an array to write into, i.e.:
cat file | while IFS='=' read -a array; do
...
done
bash version depending.
Old completely wrong answer for posterity:
Add the argument -d = to your read statement. Then you can do:
cat file | while read -d = key value; do
$array[0]="$key"
$array[1]="$value"
done
while IFS='=' read -r k v; do
: # do something with $k and $v
done < file
IFS is the 'inner field separator', which tells bash to split the line on an '=' sign.

Checking for Duplicates in array

What's going on:
I've ssh'd onto my localhost, ls the desktop and taken those items and put them into an array.
I hardcoded a short list of items and I am comparing them with a hash to see if anything is missing from the host (See if something from a is NOT in b, and let me know).
So after figuring that out, when I print out the "missing files" I get a bunch of duplicates (see below), not sure if that has to do with how the files are being checked in the loop, but I figured the best thing to do would be to just sort out the data and eliminate dupes.
When I do that, and print out the fixed data, only one file is printing, two are missing.
Any idea why?
#!/usr/bin/perl
my $hostname = $ARGV[0];
my #hostFiles = ("filecheck.pl", "hostscript.pl", "awesomeness.txt");
my #output =`ssh $hostname "cd Desktop; ls -a"`;
my %comparison;
for my $file (#hostFiles) {
$comparison{$file} +=1;
}
for my $file (#output) {
$comparison{$file} +=2
}
for my $file (sort keys %comparison) {
#missing = "$file\n" if $comparison{$file} ==1;
#print "Extra file: $file\n" if $comparison{$file} ==2;
print #missing;
}
my #checkedMissingFiles;
foreach my $var ( #missing ){
if ( ! grep( /$var/, #checkedMissingFiles) ){
push( #checkedMissingFiles, $var );
}
}
print "\n\nThe missing Files without dups:\n #checkedMissingFiles\n";
Password:
awesomeness.txt ##This is what is printing after comparing the two arrays
awesomeness.txt
filecheck.pl
filecheck.pl
filecheck.pl
hostscript.pl
hostscript.pl
The missing Files without dups: ##what prints after weeding out duplicates
hostscript.pl
The perl way of doing this would be:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %hostFiles = qw( filecheck.pl 1 hostscript.pl 1 awesomeness.txt 1);
# ssh + backticks + ls, not the greatest way to do this, but that's another Q
my #files =`ssh $ARGV[0] "ls -a ~/Desktop"`;
# get rid of the newlines
chomp #files;
#grep returns the matching element of #files
my %existing = map { $_ => 1} grep {exists($hostFiles{$_})} #files;
print Dumper([grep { !exists($existing{$_})} keys %hostFiles]);
Data::Dumper is a utility module, I use it for debugging or demonstrative purposes.
If you want print the list you can do something like this:
{
use English;
local $OFS = "\n";
local $ORS = "\n";
print grep { !exists($existing{$_})} keys %hostFiles;
}
$ORS is the output record separator (it's printed after any print) and $OFS is the output field separator which is printed between the print arguments. See perlvar. You can get away with not using "English", but the variable names will look uglier. The block and the local are so you don't have to save and restore the values of the special variables.
If you want to write to a file the result something like this would do:
{
use English;
local $OFS = "\n";
local $ORS = "\n";
open F, ">host_$ARGV[0].log";
print F grep { !exists($existing{$_})} keys %hostFiles;
close F;
}
Of course, you can also do it the "classical" way, loop trough the array and print each element:
open F, ">host_$ARGV[0].log";
for my $missing_file (grep { !exists($existing{$_})} keys %hostFiles) {
use English;
local $ORS = "\n";
print F "File is missing: $missing_file"
}
close F;
This allows you to do more things with the file name, for example, you can SCP it over to the host.
It seems to me that looping over the 'required' list makes more sense - looping over the list of existing files isn't necessary unless you're looking for files that exist but aren't needed.
#!/usr/bin/perl
use strict;
use warnings;
my #hostFiles = ("filecheck.pl", "hostscript.pl", "awesomeness.txt");
my #output =`ssh $ARGV[0] "cd Desktop; ls -a"`;
chomp #output;
my #missingFiles;
foreach (#hostFiles) {
push( #missingFiles, $_ ) unless $_ ~~ #output;
}
print join("\n", "Missing files: ", #missingFiles);
#missing = "$file\n" assigns the array #missing to contain a single element, "$file\n". It does this every loop, leaving it with the last missing file.
What you want is push(#missing, "$file\n").

How can I extract just the elements I want from a Perl array?

Hey I'm wondering how I can get this code to work. Basically I want to keep the lines of $filename as long as they contain the $user in the path:
open STDERR, ">/dev/null";
$filename=`find -H /home | grep $file`;
#filenames = split(/\n/, $filename);
for $i (#filenames) {
if ($i =~ m/$user/) {
#keep results
} else {
delete $i; # does not work.
}
}
$filename = join ("\n", #filenames);
close STDERR;
I know you can delete like delete $array[index] but I don't have an index with this kind of loop that I know of.
You could replace your loop with:
#filenames = grep /$user/, #filenames;
There's no way to do it when you're using foreach loop. But nevermind. The right thing to do is to use File::Find to accomplish your task.
use File::Find 'find';
...
my #files;
my $wanted = sub {
return unless /\Q$file/ && /\Q$user/;
push #files, $_;
};
find({ wanted => $wanted, no_chdir => 1 }, '/home');
Don't forget to escape your variables with \Q for use in regular expressions.
BTW, redirecting your STDERR to /dev/null is better written as
{
local *STDERR;
open STDERR, '>', '/dev/null';
...
}
It restores the filehandle after exiting the block.
If you have a find that supports -path, then make it do the work for you, e.g.,
#! /usr/bin/perl
use warnings;
use strict;
my $user = "gbacon";
my $file = "bash";
my $path = join " -a " => map "-path '*$_*'", $user, $file;
chomp(my #filenames = `find -H /home $path 2>/dev/null`);
print map "$_\n", #filenames;
Note how backticks in list context give back a list of lines (including their terminators, removed above with chomp) from the command's output. This saves you having to split them yourself.
Output:
/home/gbacon/.bash_history
/home/gbacon/.bashrc
/home/gbacon/.bash_logout
If you want to remove an item from an array, use the multi-talented splice function.
my #foo = qw( a b c d e f );
splice( #foo, 3, 1 ); # Remove element at index 3.
You can do all sorts of other manipulations with splice. See the perldoc for more info.
As codeholic alludes to, you should never modify an array while iterating over it with a for loop. If you want to modify an array while iterating, use a while loop instead.
The reason for this is that for evaluates the expression in parens once, and maps each item in the result list to an alias. If the array changes, the pointers get screwed up and chaos will follow.
A while evaluates the condition each time through the loop, so you won't run into issues with pointers to non-existent values.

Resources