I want to print up to 10 results per line and then after 10 force a new line. How would I do this?
This is the code:
my #email;
my #gender;
my #state;
while ( <> ) {
chomp;
my #fields = split /,/;
push #gender, $fields[5];
push #email, $fields[3];
push #state, $fields[4];
}
#records
print "There are $_ records in this file\n" for scalar (#email-1);
print "\n";
#gender
my %count;
$count{$_}++ for #gender;
while( my ($gender => $count) = each %count) {
delete $count{gender};}
print "Male/Female distribution:\n";
print join(' ',%count), "\n";
print "\n";
#email
#states
my %scount;
$scount{$_}++ for #state;
while( my ($state => $scount) = each %scount){
delete $scount{state};}
print join(' ',%scount), "\n";
and its result:
8 NC 292 OK 163 NY 901 VA 477 PA 195 NE 62 OH 711 WV 37 NM 10 MO 7 NH 77 MA 689 MN 431 TX 920 ME 81 NJ 673 RI 91 AL 230 KS 22 ND 31 FL 461 CT 305 CA 1262 IA 139 DE 33 CO 118 MI 378 IN 211 AR 163 IL 811 KY 11
So for example I would want a new line after NM 10.
change print join(' ',%scount), "\n"; to:
my $n;
print map {$n++; $n % 10 ? "$_ $scount{$_} ":"$_ $scount{$_}\n"} keys %scount;
Possibly a bit more readable:
my #results = %count;
while (#results) {
print(join(" ", splice(#results, 10)), "\n");
}
Related
This is the files I am reading,
#Log1
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
#Log2
Time Src_id Des_id Address
1 35 57 x9871
3 37 59 x9873
5 39 61 x9875
7 41 63 x9877
9 43 65 x9879
This the code I wrote where I am reading line by line and then spliting it
#!usr/bin/perl
use warnings;
use strict;
my $log1_file = "log1.log";
my $log2_file = "log2.log";
open(IN1, "<$log1_file" ) or die "Could not open file $log1_file: $!";
open(IN2, "<$log2_file" ) or die "Could not open file $log2_file: $!";
my $i_d1;
my $i_d2;
my #fields1;
my #fields2;
while (my $line = <IN1>) {
#fields1 = split " ", $line;
}
while (my $line = <IN2>) {
#fields2 = split " ", $line;
}
print "#fields1\n";
print "#fields2\n";
close IN1;
close IN2;
Output I am getting
8 42 64 x9878
9 43 65 x9879
Output Desired
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
9 43 65 x9879
Time Src_id Des_id Address
1 35 57 x9871
3 37 59 x9873
5 39 61 x9875
7 41 63 x9877
9 43 65 x9879
If I use push(#fields1 , split " ", $line); I am getting output like this,
Time Src_id Des_id Address 0 34 56 x9870 B 36 58 x9872 D 38 60 x9874 F 40 62 x9876 H 42 64 x9878
It should print whole array but printing just last row?
Also after this I need to compare both the "Times" part of both log & print in sequence way but don't know how to run both array simultaneously in while loop?
Please suggest in standard way without any modules because I need to run this in someone else server.
Following code demonstrates how to read and print log files
(OP does not specify why he splits lines into fields)
use strict;
use warnings;
use feature 'say';
my $fname1 = 'log1.txt';
my $fname2 = 'log2.txt';
my $div = "\t";
my $file1 = read_file($fname1);
my $file2 = read_file($fname2);
print_file($file1,$div);
print_file($file2,$div);
sub read_file {
my $fname = shift;
my #data;
open my $fh, '<', $fname
or die "Couldn't read $fname";
while( <$fh> ) {
chomp;
next if /^#Log/;
push #data, [split];
}
close $fh;
return \#data;
}
sub print_file {
my $data = shift;
my $div = shift;
say join($div,#{$_}) for #{$data};
}
Output
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
Time Src_id Des_id Address
1 35 57 x9871
3 37 59 x9873
5 39 61 x9875
7 41 63 x9877
9 43 65 x9879
Let's assume that OP wants to merge two files into one with sorted lines on Time field
read files into %data hash with Time field as key
print header (#fields)
print hash values sorted on Time key
use strict;
use warnings;
use feature 'say';
my(#fields,%data);
my $fname1 = 'log1.txt';
my $fname2 = 'log2.txt';
read_data($fname1);
read_data($fname2);
say join("\t",#fields);
say join("\t",#{$data{$_}}) for sort { $a <=> $b } keys %data;
sub read_data {
my $fname = shift;
open my $fh, '<', $fname
or die "Couldn't open $fname";
while( <$fh> ) {
next if /^#Log/;
if( /^Time/ ) {
#fields = split;
} else {
my #line = split;
$data{$line[0]} = \#line;
}
}
close $fh;
}
Output
Time Src_id Des_id Address
0 34 56 x9870
1 35 57 x9871
2 36 58 x9872
3 37 59 x9873
4 38 60 x9874
5 39 61 x9875
6 40 62 x9876
7 41 63 x9877
8 42 64 x9878
9 43 65 x9879
Because #fields* gets overwritten during each loop. You need this:
while(my $line = <IN1>){
my #tmp = split(" ", $line);
push(#fields1, \#tmp);
}
foreach $item (#fields1){
print("#{$item}\n");
}
Then #fields1 contains references pointing to the splited array.
The final #fields1 looks like:
#fields1 = (
<ref> ----> ["0", "34", "56", "x9870"]
<ref> ----> ["2", "36", "58", "x9872"]
...
)
The print will print:
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
And I guess it would be better if you do chomp($line).
But I'd like to simply do push(#fields1, $line). And split each array item when in comparison stage.
To compare the content of 2 files, I personally would use 2 while loops to read into 2 arrays just like what you have done. Then do the comparison in one for or foreach.
You can merge the log files using paste, and read the resulting merged file one line at a time. This is more elegant and saves RAM. Here is an example of a possible comparison of time1 and time2, writing STDOUT and STDERR into separate files. The example prints into STDOUT all the input fields if time1 < time2 and time1 < 4, otherwise prints a warning into STDERR:
cat > log1.log <<EOF
Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878
EOF
cat > log2.log <<EOF
Time Src_id Des_id Address
1 35 57 x9871
3 37 59 x9873
5 39 61 x9875
7 41 63 x9877
9 43 65 x9879
EOF
# Paste files side by side, skip header, read data lines together, compare and print:
paste log1.log log2.log | \
tail -n +2 | \
perl -lane '
BEGIN {
for $file_num (1, 2) { push #col_names, map { "$_$file_num" } qw( time src_id des_id address ) }
}
my %val;
#val{ #col_names } = #F;
if ( $val{time1} < $val{time2} and $val{time1} < 4) {
print join "\t", #val{ #col_names};
} else {
warn "not found: #val{ qw( time1 time2 ) }";
}
' 1>out.tsv 2>out.log
Output:
% cat out.tsv
0 34 56 x9870 1 35 57 x9871
2 36 58 x9872 3 37 59 x9873
% cat out.log
not found: 4 5 at -e line 10, <> line 3.
not found: 6 7 at -e line 10, <> line 4.
not found: 8 9 at -e line 10, <> line 5.
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
I am getting following error while running the script.
Use of uninitialized value in print at PreProcess.pl line 137.
Use of uninitialized value within #spl in substitution (s///) at PreProcess.pl line 137.
Is there any syntax error in the script?
(Running it in Windows - Strawberry 64 last version)
my $Dat=2;
my $a = 7;
foreach (#spl) {
if ( $_ =~ $NameInstru ) {
print $spl[$Dat] =~ s/-/\./gr, " 00:00; ",$spl[$a],"\n"; # data
$Dat += 87;
$a += 87;
}
}
inside of array i hve this type of data
"U.S. DOLLAR INDEX - ICE FUTURES U.S."
150113
2015-01-13
098662
ICUS
01
098
128104
14111
88637
505
13200
50
269
43140
34142
1862
37355
482
180
110623
126128
17480
1976
1081
-3699
8571
-120
646
50
248
1581
-8006
319
2093
31
-30
1039
1063
42
18
100.0
11.0
69.2
0.4
10.3
0.0
0.2
33.7
26.7
1.5
29.2
0.4
0.1
86.4
98.5
13.6
1.5
215
7
.
.
16
.
.
50
16
8
116
6
4
197
34
28.6
85.1
41.3
91.3
28.2
85.1
40.8
91.2
"(U.S. DOLLAR INDEX X $1000)"
"098662"
"ICUS"
"098"
"F90"
"Combined"
"U.S. DOLLAR INDEX - ICE FUTURES U.S."
150106
2015-01-06
098662
ICUS
01
098
127023
17810
80066
625
12554
0
21
41559
42148
1544
35262
452
210
109585
125065
17438
1958
19675
486
23911
49
2717
0
-73
9262
-5037
30
5873
270
95
18439
19245
1237
431
100.0
14.0
63.0
0.5
9.9
0.0
0.0
32.7
33.2
1.2
27.8
0.4
0.2
86.3
98.5
13.7
1.5
202
7
.
.
16
0
.
48
16
9
105
6
4
185
34
29.3
83.2
43.2
90.6
28.9
83.2
42.8
90.5
"(U.S. DOLLAR INDEX X $1000)"
"098662"
"ICUS"
"098"
"F90"
"Combined"
You are probably trying to load a file of data sets of a size of 87 lines each into an array, and then you get an error at the end of your data, when you try to read outside of the last array index.
You can probably solve it by iterating over the array indexes instead of the array values, e.g.
my $Dat = 2;
my $a = 7;
my $set_size = 87;
for (my $n = 0; $n + $a < #spl; $n += $set_size) {
if ( $spl[$n] =~ $NameInstru ) {
print $spl[$n + $Dat] =~ s/-/\./gr, " 00:00; ",$spl[$n + $a],"\n"; # data
}
}
While this might solve your problem, it might be better to try and find a proper way to parse your file.
If the records inside the input file are separated by a blank line, you can try to read whole records at once by changing the input record separator to "" or "\n\n". Then you can split each element in the resulting array on newline \n and get an entire record as a result. For example:
$/ = "";
my #spl;
open my $fh ...
while (<$fh>) {
push #spl, [ split "\n", $_ ];
}
...
for my $record (#spl) {
# #$record is now an 87 element array with each record in the file
}
TLP's solution of iterating over the indexes of an array, incrementing by 87 at time is great.
Here's a more complex solution, but one that doesn't require loading the entire file into memory.
my $lines_per_row = 87;
my #row;
while (<>) {
chomp;
push #row, $_;
if (#row == $lines_per_row) {
my ($instru, $dat, $a) = #row[0, 2, 7];
if ($instru =~ $NameInstru) {
print $dat =~ s/-/\./gr, " 00:00; $a\n";
}
#row = ();
}
}
I am trying to write a program that will accept a pdb file, extract all the information (atom number, atom type, residue name, residue number, x, y, z, b factor), rearrange the residue number, and save the new pdb in a new archive. I can't find a way to use a loop with a string array
This is the code:
print "\nEnter the input file: ";
$inputFile = <STDIN>;
chomp $inputFile;
unless ( open( INPUTFILE, $inputFile ) ) {
print "Cannot read from '$inputFile'.\nProgram closing.\n";
<STDIN>;
exit;
}
chomp( #dataArray = <INPUTFILE> );
close(INPUTFILE);
for ( $line = 0 ; $line <= scalar #dataArray ; $line++ ) {
if ( $dataArray[$line] =~ m/ATOM\s+(\d+)\s+(\w+)\s+(\w{3})\s+(\w)+\s+(\d+)\s+(\S+\.\S+)\s+(\S+\.\S+)\s+(\S+\.\S+)\s+(.+\S)(.\d\d+\.\d\d.+)/ig ) {
$m1 = $1;
$m2 = $2;
$m3 = $3;
$m5 = $5;
$m6 = $6;
$m7 = $7;
$m8 = $8;
$m9 = $9;
$m10 = $10;
push( #m3, $m3 );
push( #m5, $m5 );
foreach $line ( #m3, #m5 ) {
if ( $m3[$line] eq $m3[ $line + 1 ] ) {
$m5[i] = $m5[ i + 1 ];
}
elsif ( $m3[$line] ne $m3[ $line + 1 ] ) {
$m5[ i + 1 ] = $m5[i] + 1;
}
}
$~ = "PDBFORMAT";
format PDBFORMAT =
ATOM #|||| #||| #|| #||| #|||||| #|||||| #|||||| #>>>>> #>>>>>
$m1, $m2, $m3,$m51, $m6, $m7, $m8, $m9, $m10
.
open( PDBFORMAT, ">>my2pdb.txt" ) or die "Can't open anything";
write PDBFORMAT;
}
}
close PDBFORMAT;
I need to make a script that will make the 6th column continuous according to the residue name (4th column)
This is an example of the input
ATOM 316 CB LEU A 608 -38.110 31.803 16.459 1.00 64.64
ATOM 317 CG LEU A 608 -39.261 32.481 15.719 1.00 71.07
ATOM 318 CD1 LEU A 608 -38.782 33.704 14.929 1.00 73.68
ATOM 319 CD2 LEU A 608 -39.981 31.498 14.829 1.00 69.63
ATOM 320 H LEU A 608 -36.638 31.041 18.563 1.00 99.99
ATOM 321 N ARG A 565 -38.634 34.587 18.911 1.00 22.27
I think this will do as you want. Your sample data isn't very comprehensive, so all it does here is change the final residue number to 609
This program expects the path to the input file as a parameter on the command line, so something like
perl process_pdb.pl infile.pdb
use strict;
use warnings;
my ($last_name, $last_num);
while ( <> ) {
next unless /^ATOM/;
my #fields = split;
my $name = $fields[3];
if ( $last_name ) {
$fields[5] = $name eq $last_name ? $last_num : $last_num + 1;
}
print "#fields\n";
($last_name, $last_num) = #fields[3,5];
}
output
ATOM 316 CB LEU A 608 -38.110 31.803 16.459 1.00 64.64
ATOM 317 CG LEU A 608 -39.261 32.481 15.719 1.00 71.07
ATOM 318 CD1 LEU A 608 -38.782 33.704 14.929 1.00 73.68
ATOM 319 CD2 LEU A 608 -39.981 31.498 14.829 1.00 69.63
ATOM 320 H LEU A 608 -36.638 31.041 18.563 1.00 99.99
ATOM 321 N ARG A 609 -38.634 34.587 18.911 1.00 22.27
I have something I cannot get my head around
Let's say I have a phone list used for receiving and dialing out stored like below. The from and to location is specified as well.
Country1 Country2 number1 number2
USA_Chicago USA_LA 12 14
AUS_Sydney USA_Chicago 19 15
AUS_Sydney USA_Chicago 22 21
CHI_Hong-Kong RSA_Joburg 72 23
USA_LA USA_Chigaco 93 27
Now all I want to do is to remove all the duplicates and give only what is relevant to the countries as keys and each number that is assigned to it in a pair, but the pair needs to be bi-directional.
In other words I need to get results back and then print them like this.
USA_Chicago-USA_LA 27 93 12 14
Aus_Sydney-USA_Chicago 19 15 22 21
CHI_Hong-kong-RSA_Joburg 72 23
I have tried many methods including a normal hash table and the results seem fine, but it does not do the bi-direction, so I will get this instead.
USA_Chicago-USA_LA 12 14
Aus_Sydney-USA_Chicago 19 15 22 21
CHI_Hong-kong-RSA_Joburg 72 23
USA_LA-USA_Chicago 93 27
So the duplicate removal works in one way, but because there is another direction, it will not remove the duplicate "USA_LA-USA_Chicago" which already exists as "USA_Chicago-USA_LA" and will store the same numbers under a swopped name.
The hash table I tried last is something like this. (not exactly as I trashed the lot and had to rewrite it for this post)
#input= ("USA_Chicago USA_LA 12 14" ,
"AUS_Sydney USA_Chicago 19 15" ,
"AUS_Sydney USA_Chicago 22 21" ,
"CHI_Hong-Kong RSA_Joburg 72 23" '
"USA_LA USA_Chigaco 93 27");
my %hash;
for my $line (#input) {
my ($c1, $c2, $n1, $n2) = split / [\s\|]+ /x, $line6;
my $arr = $hash{$c1} ||= [];
push #$arr, "$n1 $n2";
}
for my $c1 (sort keys %hash) {
my $arr = $hash{$c1};
my $vals = join " : ", #$arr;
print "$c1 $vals\n";
}
So all if A-B exists and so does B-A, use only one but assign the values from the key being removed, to the remaining key. I basically need to do is get rid of any duplicate key in any direction, but assign the values for to the remaining key. So A-B and B-A would be considered a duplicate, but A-C and B-C are not. -_-
Simply normalise the destinations. I chose to sort them.
use strictures;
use Hash::MultiKey qw();
my #input = (
'USA_Chicago USA_LA 12 14',
'AUS_Sydney USA_Chicago 19 15',
'AUS_Sydney USA_Chicago 22 21',
'CHI_Hong-Kong RSA_Joburg 72 23',
'USA_LA USA_Chicago 93 27'
);
tie my %hash, 'Hash::MultiKey';
for my $line (#input) {
my ($c1, $c2, $n1, $n2) = split / [\s\|]+ /x, $line;
my %map = ($c1 => $n1, $c2 => $n2);
push #{ $hash{[sort keys %map]} }, #map{sort keys %map};
}
__END__
(
['CHI_Hong-Kong', 'RSA_Joburg'] => [72, 23],
['AUS_Sydney', 'USA_Chicago'] => [19, 15, 22, 21],
['USA_Chicago', 'USA_LA'] => [12, 14, 27, 93],
)
Perl is great for creating complex data structures but learning to use them effectively takes practices.
Try:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
# skip the column headers
<DATA>;
my %bidirectional = ();
while( my $line = <DATA> ){
chomp $line;
my ( $country1, $country2, $number1, $number2 ) = split ' ', $line;
push #{ $bidirectional{ $country1 }{ $country2 } }, [ $number1, $number2 ];
push #{ $bidirectional{ $country2 }{ $country1 } }, [ $number1, $number2 ];
}
print Dumper \%bidirectional;
__DATA__
Country1 Country2 number1 number2
USA_Chicago USA_LA 12 14
AUS_Sydney USA_Chicago 19 15
AUS_Sydney USA_Chicago 22 21
CHI_Hong-Kong RSA_Joburg 72 23
USA_LA USA_Chicago 93 27
I have this part of code to catch the greater value of an array immersed in a Hash. When Perl identified the biggest value the array is removed by #slice array:
if ( max(map $_->[1], #$val)){
my #slice = (#$val[1]);
my #ignored = #slice;
delete(#$val[1]);
print "$key\t #ignored\n";
warn Dumper \#slice;
}
Data:Dumper out:
$VAR1 = [
[
'3420',
'3446',
'13',
'39',
55
]
];
I want to print those information separated by tabs (\t) in one line like this list:
miRNA127 dvex589433 - 131 154
miRNA154 dvex546562 + 232 259
miRNA154 dvex573491 + 297 324
miRNA154 dvex648254 + 147 172
miRNA154 dvex648254 + 287 272
miRNA32 dvex320240 - 61 83
miRNA32 dvex623745 - 141 163
miRNA79 dvex219016 + ARRAY(0x100840378)
But in the last line always obtain this result.
How could I generate this output?:
miRNA127 dvex589433 - 131 154
miRNA154 dvex546562 + 232 259
miRNA154 dvex573491 + 297 324
miRNA154 dvex648254 + 147 172
miRNA154 dvex648254 + 287 272
miRNA32 dvex320240 - 61 83
miRNA32 dvex623745 - 141 163
miRNA79 dvex219016 + 3420 3446
Additional explication:
In this case, I want to catch the highest value in $VAR->[1] and looking if the subtraction with the minimum in $VAR->[0] is <= to 55. If not, i need to eliminate this AoA (the highest value) and fill a #ignored array with it. Next, i want to print some values of #ignored, like a list. Next, with the resultants AoA, I want to iterate the last flow...
print "$key\t $ignored[0]->[0]\t$ignored[0]->[1]";
You have an array of arrays, so each element of #ignored is an array. The notation $ignored[0] gets to the zeroth element (which is an array), and ->[0] and ->[1] retrieves the zeroth and first elements of that array.
For example:
use strict;
use warnings;
use Data::Dumper;
my #ignored;
$ignored[0] = [ '3420', '3446', '13', '39', 55 ];
my $key = 'miRNA79 dvex219016 +';
print Dumper \#ignored;
print "\n";
print "$key\t$ignored[0]->[0]\t$ignored[0]->[1]";
Output:
$VAR1 = [
[
'3420',
'3446',
'13',
'39',
55
]
];
miRNA79 dvex219016 + 3420 3446
Another option that generates the same output is to join all the values with a \t:
print join "\t", $key, #{ $ignored[0] }[ 0, 1 ];