Perl: load contents of string into array - arrays

I have a simple question I was hoping you guys can help shed light on. I'm steadily learning perl.
Say I have a very large string, for example take the output of:
our $z = `du -B MB /home`
This will produce a string like the following:
1MB /home/debug/Music
1MB /home/debug/Downloads
20MB /home/debug
20MB /home/
What I would like to know is, how do I go about loading this string into an array with two columns, and n rows (where n is the number of lines in the du output)?
I was trying something like the following:
my $z1 = `du -B MB /home | tail -4`;
my #c0 = split (/n/, $z1);
my $z2 = join (/\t/, #c0);
my #c2=split(/\t/, $z2);
print #c2;
Which produces the following output:
1MB/home/debug/Music1MB/home/debug/Downloads20MB/home/debug20MB/home
I suppose I can use the substitution function s///g to substitue the directories for null values, and set the SPACE values to one array, and null the space values and set that to a second array, and can set 1 array as keys to the other.
Does anyone have any suggestions as to the best way to approach this?
Any help is appreciated.
Thanks,
Diego

#!/usr/bin/perl;
my $z= `du -h -d 1 /home/*`;
my #array = split("\n",$z);
foreach my $ar(#array) {
my #ar = split("\t",$ar);
$ar = \#ar;
}
foreach my $entry(#array) {
print $entry->[0];
print $entry->[1];
}

You can probably try and store them in a hash as follows:
#!/usr/bin/perl
use strict;
use warnings;
my $data = '1MB /home/work 4MB /user/bin';
my %values = split(' ', $data);
foreach my $k (keys %values) {
print "$k: $values{$k}\n";
}
exit 0;
Note that ' ' as the first argument of split will match any whitespace character (so we make the most of it). The output for the above should be something like:
1MB: /home/work
4MB: /user/bin
You will have to work the original data into $data and check if a hash works for you.

I'm sure you perl veterans won't like this solution much, but I basically resolved to my nix roots. I was having trouble with this earlier in a for loop, but I realized I can utilize pipes to be evaluated via the shell, so long as the answer is stored in a string.
my $z1; my $z2;
$z1 = `du -B MB /home | cut -f1`
$z2 = `du -B MB /home | cut -f2`
my #a1; my #a2;
#a1 = split("\n", $z1);
#a2 = split("\n", $z2);
Array #a1 holds values from first string, #a2 holds values from 2nd string.
My question to you guys is, is there a perl equivelent to the unix 'cut' utility? I.e. where I can split a strings output to the first tab-delimited field? This is a problem I'll have to explore.
Diego

I'm honestly not sure what you're trying to accomplish. If you simply want to have an array with n elements, each of which is a string with two columns in it, you need look no farther than
my #z1 = `du -B MB /home | tail -4`;
For example, the third line of your file could be accessed (remember perl arrays are 0-based):
print $z1[2];
producing output
20MB /home/debug
More useful would be to capture each directory's size in a hash:
my %dir2size;
foreach (#z1) {
my #line = split /\s+/;
$dir2size{$line[1]} = $line[0];
}
foreach my $key (sort keys %dir2size) {
print "$key: $dir2size{$key}\n";
}
which produces the output
/home/: 20MB
/home/debug: 20MB
/home/debug/Downloads: 1MB
/home/debug/Music: 1MB

Related

Generate array and subarray dynamically (perl)

I have a several files that have contain product install information.
I am able to grep for the elements that I want (for example, "version") from the files. And I end up with something like this:
instancefile1:ABC_version=1.2.3
instancefile1:XYZ_version=2.5
instancefile2:ABC_version=3.4.5
instancefile3:XYZ_version=1.1
Where the components are named ABC or XYZ.
What I'd like to do is take this grep output and parse it through perl to build arrays on the file.
The first array would be composed of the instance number (pulled from the filenames) - above, I'd have an array that would have 1,2,3 as elements.
And inside each of those arrays would be the components that that particular instance has.
Full expected arrays and components from above:
array[0] = instancefile1 # can keep this named like the filename,
or assign name. Does not matter
array[0][0] = ABC_version=1.2.3
array[0][1] = XYZ_version=2.5
array[1] = instancefile2
array[1][0] = ABC_version=3.4.5
array[2] = instancefile3
array[2][0] = XYZ_version=1.1
(I know my notation for referencing subarrays is not correct - I'm rusty on my perl.)
How might I do that?
(I've been doing it with just bash arrays and grep and then reiterating through the initial grep output with my first array and doing another grep to fill another array - but this seems like it is going through the data more than one time, instead of building it on the fly.)
The idea is for it to build each array as it sees it. It sees "fileinstance1", it stores the values to the right in that array, as it sees it. Then if it sees "fileinstance2", it creates that array and populates with those values, all in one pass. I believe perl is the best tool for this?
Unless you can guaranteed the records with the same key will be next to each other, it's easier to start with a HoA.
my %data;
while (<>) {
chomp;
my ($key, $val) = split /:/;
push #{ $data{$key} }, $val;
}
Then convert to AoA:
my #data = values(%data);
Order-preserving:
my %data;
my #data;
while (<>) {
chomp;
my ($key, $val) = split /:/;
push #data, $data{$key} = []
if !$data{$key};
push #{ $data{$key} }, $val;
}

Grep elements from array that exists in output

Is there any way to use grep to find only elements that exists in specific array?
For example :
my #IPS ={"10.20.30","12.13.14","30.40.50"};
my $cmd = `netstat -Aa | grep -c IPS[0] OR IPS[1] OR IPS[2] `
print "$cmd";
I want cmd to return the number of IPS (only those found in the array) that exists in the output of netstat command.
I know I can use " | " or condition but assume that I do not know the number of elements in array.
Your #IPS array does not contain what you think it contains. I think you probably wanted:
my #IPS = ("10.20.30","12.13.14","30.40.50");
And I'd write that as:
my #IPS = qw(10.20.30 12.13.14 30.40.50);
I know I can use " | " or condition but assume that I do not know the number of elements in array
I don't think that matters at all.
# Need quotemeta() to escape the dots
my $IP_str = join '|', map { quotemeta $_ } #IPS;
my $IP_re = qr/$IP_str/;
# Keep as much of the processing as possible in Perl-space
my #found = grep { /$IP_str/ } `netstat -Aa`;
say scalar #found;
An alternative to the regex, is to turn #IPS into a hash.
my %IP = map { $_ => 1 } #IPS;
my #found = grep { $IP{$_} } `netstat -Aa`;
say scalar #found;
Update: Actually, that last example doesn't work. You would need to extract the IP addresses from the netstat output before matching against the hash. but I've left it there in case it inspires anyone to expand it.
A quick and dirty way to do this is by utilizing the $" variable in perl:
my #IPS =("10.20.30","12.13.14","30.40.50");
local $"='|';
my $cmd = `netstat -Aa | grep '#IPS' | wc -l`
print "$cmd";
The idea is that $" controls how an array variable is appears when interpolated inside some quotes. So by saying $"='|' we cause the #IPS array to be interpolated as a concatenation of its elements separated by |. Note: you should only use this if you trust of the source of #IPS e.g. if you type it yourself in the code. If it comes from an untrusted external source, the above trick might be dangerous.
Edit: If #IPS does come from an untrusted source, you can validate it first like so
/^[\d.]+$/ or die "Invalid IP address" for #IPS;

capturing a file into array and using that array in while loop

Hi I have to capture a file into an array and then pass that array into the while loop.
I just don't want to execute my script with below while loop because it is taking long time...
while read line; do
some actions...
done < file.txt
My server has 8 GB ram and out of which 6 GB is always available. So please let me know weather it is a good idea to capture the file of size 100 MB into memory (array) and do operations like grep,sed,awk and etc on it.
If So please let me know how to capture file into array.
If not kindly suggest me another way to increase performance.
I'm not sure to understand...
Do you need something like this ?
array=()
# Read the file in parameter and fill the array named "array"
getArray() {
i=0
while read line
do
array[i]=$line
i=$(($i + 1))
done < $1
}
getArray "file.txt"
for line in "${array[#]}"
do
# some actions using $line
done
EDIT :
To answer your question, yes it's possible to grep data into an array and push its into another. There is probably a better way to do it, but this works :
array2=()
# Split the string in parameter and push the values into the array
pushIntoArray() {
i=0
for element in $1
do
array2[i]=$element
i=$(($i + 1))
done
}
array1=("foo" "bar" "baz")
# Build a string of the elements into the array separated by '\n' and redirect the ouput to grep.
str=`printf "%s\n" "${array1[#]}" | grep "a"`
pushIntoArray "$str"
printf "%s\n" "${array2[#]}" # Display array2 line by line
Output of this snippet:
$ ./grep_array.sh
bar
baz
You want a while loop. Try this:
while read d; do
echo $d
done < dinosaurs.txt
Enjoy

Search for, and remove column from CSV file

I'm trying to write a subroutine that will take two arguments, a filename and the column name inside a CSV file. The subroutine will search for the second argument (column name) and remove that column (or columns) from the CSV file and then return the CSV file with the arguments removed.
I feel like I've gotten through the first half of this sub (opening the file, retrieve the headers and values) but I can't seem to find a way to search the CSV file for the string that the user inputs and delete that whole column. Any ideas? Here's what I have so far.
sub remove_columns {
my #Para = #_;
my $args = #Para;
die "Insufficent arguments\n" if ($nargs < 2);
open file, $file
$header = <file>;
chomp $header;
my #hdr = split ',',$header;
while (my $line = <file>){
chomp $line;
my #vals = split ',',$line;
#hash that will allow me to access column name and values quickly
my %h;
for (my $i=0; $i<=$#hdr;$i++){
$h{$hdr[$i]}=$i;
}
....
}
Here's where the search and removal will be done. I've been thinking about how to go about this; the CSV files that I'll be modifying will be huge, so speed is a factor, but I can't seem to think of a good way to go about this. I'm new to Perl, so I'm struggling a bit.
Here are a few hints that will hopefully get you going.
To remove the element of an array at position $index of an array use :
splice #array,$index,1 ;
As speed is an issues, you probably want to construct an array of column numbers at the start and then loop on the the elements of the array
for my $index (#indices) {
splice #array,$index,1 ;
}
(this way is more idiomatic Perl than for (my $i=0; $i<=$#hdr;$i++) type loop )
Another thing to consider - CSV format is surprisingly complicated. Might your data have data with , within " " such as
1,"column with a , in it"
I would consider using something like Text::CSV
You should probably look in the direction of Text::CSV
Or you can do something like this:
my $colnum;
my #columns = split(/,/, <$file>);
for(my $i = 0; $i < scalar(#columns); $i++) {
if($columns[$i] =~ /^$unwanted_column_name$/) {
$colnum = $i;
last;
};
};
while(<$file>) {
my #row = split(/,/, $_);
splice(#row, $colnum, 1);
#do something with resulting array #row
};
Side note:
you really should use strict and warnings;
split(/,/, <$file>);
won't work with all CSV files
There is elegant way how to remove some columns from array. If I have columns to removal in array #cols, and headers in #headers I can make array of indexes to preserve:
my %to_delete;
#to_delete{#cols} = ();
my #idxs = grep !exists $to_delete{$headers[$_]}, 0 .. $#headers;
Then it's easy to make new headers
#headers[#idxs]
and also new row from read columns
#columns[#idxs]
The same approach can be used for example for rearranging arrays. It is very fast and pretty idiomatic Perl way how to do this sort of tasks.

Bash: Split a string into an array

First of all, let me state that I am very new to Bash scripting. I have tried to look for solutions for my problem, but couldn't find any that worked for me.
Let's assume I want to use bash to parse a file that looks like the following:
variable1 = value1
variable2 = value2
I split the file line by line using the following code:
cat /path/to/my.file | while read line; do
echo $line
done
From the $line variable I want to create an array that I want to split using = as a delimiter, so that I will be able to get the variable names and values from the array like so:
$array[0] #variable1
$array[1] #value1
What would be the best way to do this?
Set IFS to '=' in order to split the string on the = sign in your lines, i.e.:
cat file | while IFS='=' read key value; do
${array[0]}="$key"
${array[1]}="$value"
done
You may also be able to use the -a argument to specify an array to write into, i.e.:
cat file | while IFS='=' read -a array; do
...
done
bash version depending.
Old completely wrong answer for posterity:
Add the argument -d = to your read statement. Then you can do:
cat file | while read -d = key value; do
$array[0]="$key"
$array[1]="$value"
done
while IFS='=' read -r k v; do
: # do something with $k and $v
done < file
IFS is the 'inner field separator', which tells bash to split the line on an '=' sign.

Resources