I'm trying to convert values from the first column in a CSV file and arranging those values from lowest latitude/highest longitude (southwest) to highest latitude/lowest longitude (northeast) into a matrix.
This CSV file has hundreds of lines of data.
I know that I'm going to need to use sort to accomplish this but, I'm not sure if what I have so far is enough.
Sample:
18,49.000,-96.000
30,41.000,-109.000
65,31.000,-80.000
25,47.000,-75.000
45,37.000,-90.000
60,30.000,-100.000
70,30.000,-118.000
...
...
...
Bash Code:
sort -t',' -nr -k2 -k3
Result:
18,49.000,-96.000
25,47.000,-75.000
30,41.000,-109.000
45,37.000,-90.000
65,31.000,-80.000
60,30.000,-100.000
70,30.000,-118.000
Sample Conceptual Graphic:
Expected Matrix Setup:
18
25
30
45
70 60
65
If it is acceptable to use general programming language like Perl, try something like:
perl -e '
while (<>) {
($code, $lat, $long) = split(/,/);
$lats{$lat}++;
$longs{$long}++;
$map{$lat, $long} = $code;
}
for $lat (sort {$b <=> $a} keys %lats) {
for $long (sort {$a <=> $b} keys %longs) {
$str = $map{$lat, $long} || " ";
print $str . " ";
}
print "\n";
}' samplefile
and the result looks like:
18
25
30
45
65
70 60
Note that #65 is not southernmost in your sample data.
Hope this helps.
Related
There is a string as:
$string= 123456-9876;
Need to split it in array as follows:
$string = [12,34,56,98,76]
trying to split it as split('-',$string) is not serving the purpose.
How could i do that in perl?
Extract pairs of digits: (e.g. "1234-5678" ⇒ [12,34,56,78])
$string = [ $string =~ /\d\d/g ];
Extract pairs of digits, even if separated by non-digits: (e.g. "1234-567-8" ⇒ [12,34,56,78])
$string = [ $string =~ s/\D//rg =~ /../sg ];
Rather than splitting, you can capture all 2 digit numbers with this perl code,
$str = "123456-9876";
my #matches = $str =~ /\d{2}/g;
print "#matches\n";
Prints,
12 34 56 98 76
Another solution, that just groups two digits no matter whatever, wherever non-digits are present in the string, without mutating the original string
$string = "1dd23-dsd--456-9-876";
while($string =~ /(\d).*?(\d)/g) {
print "$1$2 ";
}
Prints,
12 34 56 98 76
Looking to be able to parse an array based on a variable and take the next 2 characters
array=( 7501 7302 8403 9904 )
if var = 73, result desired is 02
if var = 75, result desired is 01
if var = 84, result desired is 03
if var = 99, result desired is 04
Sorry if this is an elementary question, but I've tried variations of cut and grep and cannot find the solution.
Any help is greatly appreciated.
You can use this search function using printf and awk:
srch() {
printf "%s\n" "${array[#]}" | awk -v s="$1" 'substr($1, 1, 2) == s{
print substr($1, 3)}' ;
}
Then use it as:
srch 75
01
srch 73
02
srch 84
03
srch 99
04
Since bash arrays are sparse, even in older versions of bash that don't have associative arrays (mapping arbitrary strings as keys), you could have a regular array that has keys only for numeric indexes that you wish to map. Consider the following code, which takes your input array and generates an output array of that form:
array=( 7501 7302 8403 9904 )
replacements=( ) # create an empty array to map source to dest
for arg in "${array[#]}"; do # for each entry in our array...
replacements[${arg:0:2}]=${arg:2} # map the first two characters to the remainder.
done
This will create an array that looks like (if you ran declare -p replacements after the above code to dump a description of the replacements variable):
# "declare -p replacements" will then print this description of the new array generated...
# ...by the code given above:
declare -a replacements='([73]="02" [75]="01" [84]="03" [99]="04")'
You can then trivially look up any entry in it as a constant-time operation that requires no external commands:
$ echo "${replacements[73]}"
02
...or iterate through the keys and associated values independently:
for key in "${!replacements[#]}"; do
value=${replacements[$key]}
echo "Key $key has value $value"
done
...which will emit:
Key 73 has value 02
Key 75 has value 01
Key 84 has value 03
Key 99 has value 04
Notes/References:
See the bash-hackers wiki on parameter expansion for understanding of the syntax used to slice the elements (${arg:0:2} and ${arg:2}).
See BashFAQ #5 or the BashGuide on arrays for more details on the syntax used above.
I have a file named all_energy.out and I am trying to sort it in a way so I can renumber files in the directory based on the lowest energy in the all_energy.out file. So I want to create an array with the file names and energy starting at the lowest energy and going to increasing energy like name and age.
Analogous Example:
Don 24
Jan 30
Sue 19
sorted to
Sue 19
Don 24
Jan 30
Example of all_energy.out file: The highest negative value is the lowest energy.
Energy
0001_IP3_fullBinding_Rigid0001 -219.209742
0001_IP3_fullBinding_Rigid0002 -219.188106
0001_IP3_fullBinding_Rigid0003 -219.064542
0001_IP3_fullBinding_Rigid0004 -219.050730
0001_IP3_fullBinding_Rigid0005 -219.044573
0001_IP3_fullBinding_Rigid0006 -218.927479
0001_IP3_fullBinding_Rigid0007 -218.919717
0001_IP3_fullBinding_Rigid0008 -218.900923
0001_IP3_fullBinding_Rigid0009 -218.898945
0001_IP3_fullBinding_Rigid0010 -218.889269
0001_IP3_fullBinding_Rigid0011 -218.871619
0001_IP3_fullBinding_Rigid0012 -218.859429
0001_IP3_fullBinding_Rigid0013 -218.848516
0001_IP3_fullBinding_Rigid0014 -218.835355
0001_IP3_fullBinding_Rigid0015 -218.822244
0001_IP3_fullBinding_Rigid0016 -218.819328
0001_IP3_fullBinding_Rigid0017 -218.818431
0001_IP3_fullBinding_Rigid0018 -218.815494
0001_IP3_fullBinding_Rigid0019 -218.798388
0001_IP3_fullBinding_Rigid0020 -218.792151
Energy
0002_IP3_fullBinding_Rigid0001 -226.007998
0002_IP3_fullBinding_Rigid0002 -225.635657
The file names are given before the energy value, for example 0001_IP3_fullBinding_Rigid0001.mol2 is the name of the first file.
Example solution:
0002_IP3_fullBinding_Rigid0001 -226.007998
0002_IP3_fullBinding_Rigid0002 -225.635657
0001_IP3_fullBinding_Rigid0001 -219.209742
0001_IP3_fullBinding_Rigid0002 -219.188106
0001_IP3_fullBinding_Rigid0003 -219.064542
My current script is:
#!/usr/bin/perl
use strict;
use warnings;
print "Name of all total energy containing file:\n";
my $energy_file = <STDIN>;
chomp $energy_file;
my $inputfile_energy = $energy_file;
open (INPUTFILE_ENERGY, "<", $inputfile_energy) or die $!;
print map $inputfile_energy->[0],
sort { $a->[1] <=> $b->[1] }
map { [ $_, /(\d+)$/ ] }
<INPUTFILE_ENERGY>;
close $inputfile_energy;
At this point I am just trying to get the energy with their names to print to the correct order. Then I will loop through the files in the directory and when the name matches with the sorted energy names it will be renumber.
Problems with your script:
/(\d+)$/ only matches digits (0-9) at the end of a line. Your file contains floating point numbers, so only digits after the decimal point will be matched. You could get away with /(\S+)$/ instead. (Actually, in your sample input there is a line with a trailing space, so let's make that /(\S+)\s*$/ instead)
$inputfile_energy is a filename, a scalar, and not a reference, so $inputfile_energy->[0] doesn't make sense. You use it as the expression in a map construction, and in a map EXPR, LIST construction, $_ refers to the current element of the list that is being iterated through, so you probably just meant to say $_->[0].
Your input contains a few lines -- all with the keyword Energy -- that don't have the same format as the other lines you want to sort and should be filtered out.
Putting this all together, I get working code when the penultimate statement looks like:
print map $_->[0],
sort { $a->[1] <=> $b->[1] }
map { [ $_, /(\S+)\s*$/ ] }
grep /\d/,
<INPUTFILE_ENERGY>;
you can use oneliner like this and run it from command line:
perl -lnae 'push #arr, [$_, $F[1]] if $F[1]; END { print join "\n", map {$_->[0]} sort {$a->[1] <=> $b->[1]} #arr }' energy_file.txt
1) special key -n makes the loop over all lines in input file (energy_file.txt); current line is available in $_ variable.
2) then key -a splits each line by whitespaces and puts nonempty values into #F array.
A less "idiomatic" solution could be :
#data = <DATA>;
my #table;
foreach(#data){
chomp;
next unless /^0/; # skip Energy lines (or any other cleaning test)
#line = split /\s+/;
push #table,[#line]; # build a 2d array
}
my #sortedTable = sort { $a->[1] <=> $b->[1] } #table;
foreach(#sortedTable){
printf(
"%5s,%25s\n",
$_->[0],
$_->[1]
) # some pretty printing
}
__DATA__
Energy
0001_IP3_fullBinding_Rigid0001 -219.209742
0001_IP3_fullBinding_Rigid0002 -219.188106
0001_IP3_fullBinding_Rigid0003 -219.064542
0001_IP3_fullBinding_Rigid0004 -219.050730
....
Try this:
print join "\n", sort {(split /\s+/,$a)[1] <=> (split /\s+/,$b)[1]} map{chomp $_; $_} <INPUTFILE_ENERGY>;
I need to know how can I fill the array in perl randomly. For example: I want declare an array containing 10 elements smaller than 60. Can someone explain me how can I do it or send me any guide? I would be so grateful.
I'm assuming you meant "ten non-negative integers less than 60".
With possibility of repeats:
my #rands = map { int(rand(60)) } 1..10;
For example,
$ perl -E'say join ",", map { int(rand(60)) } 1..10;'
0,28,6,49,26,19,56,32,56,16 <-- 56 is repeated
$ perl -E'say join ",", map { int(rand(60)) } 1..10;'
15,57,50,16,51,58,46,7,17,53
$ perl -E'say join ",", map { int(rand(60)) } 1..10;'
13,57,26,47,30,14,47,55,39,39 <-- 47 and 39 are repeated
Without possibility of repeats:
use List::Util qw( shuffle );
my #rands = (shuffle 0..59)[0..9];
For example,
$ perl -MList::Util=shuffle -E'say join ",", (shuffle 0..59)[0..9];'
13,50,8,21,11,24,28,51,55,38
$ perl -MList::Util=shuffle -E'say join ",", (shuffle 0..59)[0..9];'
1,0,58,46,47,49,52,33,5,13
$ perl -MList::Util=shuffle -E'say join ",", (shuffle 0..59)[0..9];'
19,43,45,49,23,53,2,38,59,35
You can simply do:
my #r = map int(rand(60)), 0..9;
say Dumper\#r;
You could take advantage of perl's random sorting of hash keys. This will fill an array of 10 elements randomly each time you run it:
use warnings;
use strict;
my #nums = (1 .. 60);
my %data;
$data{$_}++ foreach #nums;
my $count = 0;
my #random;
foreach (keys %data){
$count++;
push #random, $_ if $count <= 10;
}
I run a report between 2 csv files, the last bit i wish to do check is to add matching elemants of the 2 arrays (built up of unique values and occurances) together. but i can't work out how to do a for each matching name in each both arrays add together, to get the output as below.
INPUT:
jon 22
james 12
ken 22
jack 33
jim 11
harry 7
dave 9
grant 12
matt 74
malc 12
INPUT1:
jon 2
james 1
ken 8
jack 5
jim 1
harry 51
dave 22
Desired Output:
jon 24
james 13
ken 30
jack 38
jim 12
harry 58
dave 31
grant 12
matt 74
malc 12
code i have so to create oput from INPUT and INPUT1
my %seen;
seek INPUT, 0, 0;
while (<INPUT>)
{
chomp;
my $line = $_;
my #elements = split (",", $line);
my $col_name = $elements[1];
#print " $col_name \n" if !
$seen{$col_name}++;
}
while ( my ( $col_name, $times_seen ) = each %seen ) {
my $loc_total = $times_seen * $dd;
print "\n";
print " $col_name \t\t : = $loc_total";
printf OUTPUT "%-34s = %15s\n", $col_name , " $loc_total ";
}
############## ###################
my %seen2;
seek INPUT1, 0, 0;
while (<INPUT1>)
{
chomp;
my $line = $_;
my #elements1 = split (",", $line);
my $col_name = $elements1[1];
my $col_type = $elements1[5];
$seen2{$col_name}++ if $col_type eq "YES";
}
while ( my ( $col_name, $times_seen2 ) = each %seen2 ) {
my $loc_total = $times_seen2 ;
print "\n $col_name \t\t= $loc_total";
printf OUTPUT "%-34s = %15s\n", $col_name , $times_seen2 ;
}
close INPUT;
Instead of using %seen, store the running total in the hash directly:
#!/usr/bin/perl
use warnings;
use strict;
my %count;
for my $file ('INPUT', 'INPUT1') {
open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
my ($name, $num) = split;
$count{$name} += $num;
}
}
for my $name (sort { $count{$b} <=> $count{$a} } keys %count) {
print "$name\t$count{$name}\n";
}
First, I'll assume that the input files are actual CSV files -- whereas your examples are just whitespace delimited. In other words:
jon,22
james,12
ken,22
jack,33
jim,11
harry,7
dave,9
grant,12
matt,74
malc,12
and
jon,2
james,1
ken,8
jack,5
jim,1
harry,51
dave,22
ASSUMING I'm correct, then your while loops will do the trick, with a couple of tweaks:
The first element of your #elements arrays have index 0, not 1. So the "key" here is at $elements[0], and the "value" is at $elements[1]. So you'd have something like:
my $col_name = $elements[0];
my $col_value = $elements[1];
Instead of incrementing %seen, it seems more useful to add the value, like so:
$seen{ $col_name } += $col_value;
In your while loop which iterates over INPUT1, do the same thing done in the first loop to extract data; also, don't use %seen2; instead, simply add to %seen as above:
my $col_name = $elements1[0];
my $col_value = $elements1[1];
$seen{$col_name} += $col_value;
Your totals will then be stored in %seen, so your final while loop is slightly modified:
while ( my ( $col_name, $times_seen2 ) = each %seen ) { # instead of %seen2
If your two processing loops are identical (and I see it's possible that they're not), then I'd suggest factoring them into a common subroutine. But that's a different matter.
The following could easily be adapted to just take file names from the command line instead.
Maintains the order of the keys in your file:
use strict;
use warnings;
use autodie;
my #names;
my %total;
local #ARGV = qw(INPUT INPUT1);
while (<>) {
my ($name, $val) = split;
push #names, $name if ! exists $total{$name};
$total{$name} += $val;
}
for (#names) {
print "$_ $total{$_}\n";
}