TCL list data to histogram - arrays

I'm doing some data analysis, and the output is a long list of numbers. Each line consists of 1 to n numbers, which may be duplicated:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 4
I'd like to put these into a (time-series) histogram. I'm not an expert in tcl (yet?), and I have some ideas how to do this but I have not been successful yet. The puts statements are just so I can see what's happening.
while { [gets $infile line] != -1 } {
set m [llength $line]
puts "line length $m"
foreach item $line {
puts $item
incr nc($item)
puts "nc: $nc($item)"
}
}
this nc array I've created is giving me a size-based array. However, I'd like a per-line based array (2D). Naively it would be nc($item)($nlines). I initially tried labeling the array variable with the length such as nc${item}($nlines), but I am not smart enough to get that to work.
I appreciate any help.
Best
Mike

Although Tcl arrays are one-dimensional, you can construct key strings to fake multi-dimensionality:
set lineno -1
set fh [open infile r]
while {[gets $fh line] != -1} {
incr lineno
foreach item [split [string trim $line]] {
incr nc($lineno,$item)
}
}
close $fh
# `parray` is a handy command for inspecting arrays
parray nc
outputs
nc(0,1) = 20
nc(0,2) = 8
nc(0,3) = 2
nc(0,4) = 1
nc(1,1) = 2
nc(1,2) = 4
nc(1,4) = 3
nc(2,1) = 1
nc(2,2) = 1
nc(2,3) = 1
nc(2,4) = 1
Or use dictionaries:
set lineno -1
set nc {}
set fh [open infile r]
while {[gets $fh line] != -1} {
set thisLine {}
foreach item [split [string trim $line]] {
dict incr thisLine $item
}
dict set nc [incr lineno] $thisLine
}
close $fh
dict for {line data} $nc {
puts [list $line $data]
}
outputs
0 {1 20 2 8 3 2 4 1}
1 {1 2 2 4 4 3}
2 {1 1 2 1 3 1 4 1}

Related

How do you rotate a single layer of a matrix by n places?

I have read all the questions on stack overflow regarding rotating matrices
and none of them address anything other than transposing or rotating by 90 or 180 degrees in either direction of either single layers or the entire matrices.
I have already answered the question as far as accomplishing it but I am looking for fast and/or 'in place' methods, preferably, but any methods
other than the one I have given would be nice to see as well if only for
educational purposes. My example is in Perl 5 but as long as your intentions are clear, most any language should be acceptable.
The matrix used in my example is as follows
# 2D array #arr has 3 layers.
# #arr is
# 0 1 2 3 4 5
# a b c d e f
# 5 4 3 2 1 0
# 9 8 7 6 5 4
# f e d c b a
# 4 5 6 7 8 9
Goal->Rotate middle layer 2 places clockwise to get the following...
Notice how the top left corner of the middle layer, 'b c', has moved 2 places to the right and the '8 4' from the middle left side is now where the 'b c' was.
# Rotated #arr is
# 0 1 2 3 4 5
# a 8 4 b c f
# 5 e 3 2 d 0
# 9 d 7 6 e 4
# f c b 5 1 a
# 4 5 6 7 8 9
Currently I have been doing something along the lines of this, which is admittedly, terribly slow. Especially on many large arrays.
#!/usr/bin/env perl
use strict; use warnings;
# Coordinates are $arr[$row][$col]
my #arr;
$arr[0][0]='0'; $arr[0][1]='1'; $arr[0][2]='2'; $arr[0][3]='3'; $arr[0][4]='4'; $arr[0][5]='5';
$arr[1][0]='a'; $arr[1][1]='b'; $arr[1][2]='c'; $arr[1][3]='d'; $arr[1][4]='e'; $arr[1][5]='f';
$arr[2][0]='5'; $arr[2][1]='4'; $arr[2][2]='3'; $arr[2][3]='2'; $arr[2][4]='1'; $arr[2][5]='0';
$arr[3][0]='9'; $arr[3][1]='8'; $arr[3][2]='7'; $arr[3][3]='6'; $arr[3][4]='5'; $arr[3][5]='4';
$arr[4][0]='f'; $arr[4][1]='e'; $arr[4][2]='d'; $arr[4][3]='c'; $arr[4][4]='b'; $arr[4][5]='a';
$arr[5][0]='4'; $arr[5][1]='5'; $arr[5][2]='6'; $arr[5][3]='7'; $arr[5][4]='8'; $arr[5][5]='9';
# Print matrix
print_matrix(#arr);
# Extract layer 2
my $target_layer=2;
my $layer_two=extract_layer($target_layer,#arr);
# From the top left corner of the layer, it is as follows
# bcde15bcde84
print "\n$layer_two\n";
# Rotate layer 2 clockwise 2 places
$layer_two=rotate_layer_cl($layer_two,2);
# 84bcde15bcde
print "$layer_two\n\n";
# Replace layer 2 in the same way
#arr=replace_layer($target_layer,$layer_two,#arr);
# Print again
print_matrix(#arr);
### Sub functions ###
# Extract layer by walking around it's coordinates like so
# [1,1]-[1,4] Top(left->right)
# [2,4]-[4,4] Right(top->bottom)
# [4,3]-[4,1] Bottom(right->left)
# [3,1]-[2,1] Left(bottom->top)
sub extract_layer {
my ($layer_cnt,#matrix)=#_;
my $width=scalar(#matrix);
my $layer_width=$width-$layer_cnt;
# layer_cnt=2
# width=6
# layer_width=4
my $layer;
for my $col ( $layer_cnt-1..$layer_width ) {
$layer.=$matrix[$layer_cnt-1][$col];
}
for my $row ( $layer_cnt..$layer_width ) {
$layer.=$matrix[$row][$layer_width];
}
my $cnt=$layer_width-1;
while ( $cnt >= $layer_cnt-1 ) {
$layer.=$matrix[$layer_width][$cnt];
$cnt--;
}
$cnt=$layer_width-1;
while ( $cnt >= $layer_cnt ) {
$layer.=$matrix[$cnt][$layer_cnt-1];
$cnt--;
}
return $layer;
}
# Shift input to the right by $n places, wrapping around.
sub rotate_layer_cl {
my $n=$_[1];
my $buf=substr($_[0],length($_[0])-$n,$n);
return $buf.substr($_[0],0,length($_[0])-$n);
}
# Replace each char from the rotated layer.
sub replace_layer {
my ($layer_cnt,$layer,#matrix)=#_;
my $width=scalar(#matrix);
my $layer_width=$width-$layer_cnt;
# layer_cnt=2
# width=6
# layer_width=4
my $slot=0;
for my $col ( $layer_cnt-1..$layer_width ) {
$matrix[$layer_cnt-1][$col]=substr($layer,$slot,1);
$slot++;
}
for my $row ( $layer_cnt..$layer_width ) {
$matrix[$row][$layer_width]=substr($layer,$slot,1);
$slot++;
}
my $cnt=$layer_width-1;
while ( $cnt >= $layer_cnt-1 ) {
$matrix[$layer_width][$cnt]=substr($layer,$slot,1);
$slot++;
$cnt--;
}
$cnt=$layer_width-1;
while ( $cnt >= $layer_cnt ) {
$matrix[$cnt][$layer_cnt-1]=substr($layer,$slot,1);
$slot++;
$cnt--;
}
return #matrix;
}
# Prints given matrix
sub print_matrix {
foreach my $row (#_) {
my $cnt=0;
foreach my $char (#$row) {
print $char; $cnt++;
if ( $cnt == scalar(#_) ) {
print "\n";
} else {
print " ";
}
}
}
}
The above code outputs as follows,
0 1 2 3 4 5
a b c d e f
5 4 3 2 1 0
9 8 7 6 5 4
f e d c b a
4 5 6 7 8 9
bcde15bcde84
84bcde15bcde
0 1 2 3 4 5
a 8 4 b c f
5 e 3 2 d 0
9 d 7 6 e 4
f c b 5 1 a
4 5 6 7 8 9
which shows the array before rotating the middle layer, the middle layer as a string, the middle layer shifted as a string, and the final resulting array after the middle layer has been rotated 2 places to the right.
-----EDIT-----
So far, the fastest method I have found is to not actually use the full 2D array but to put the block in a regular array like...
my #hex_ary=('1234', '5678', '90ab', 'cdef');
Then create the layer string and shift it the same way my original example shows.
However, because they are strings, I only need to "walk" up and down the sides. The top and bottom are simply referenced as...
my $top=substr($hex_ary[0],0,3);
and
my $bot=reverse(substr($hex_ary[3],0,3));
for the outer layer in this small 4x4 array, while the sides are itered through with
my $right_side_char=substr($hex_ary[$top_row-$i],-1);
my $left_side_char=substr($hex_ary[$bot_row+$i],0,1);
Doing so increases performance approx. 100% from the 2D method because the number of slices taken from the array is then half+2.
Maybe someone with a better grasp of C could create something more efficient. I feel like Perl is just only capable of so much sometimes.
The idea of my method is to use a queue (well, I'm using a Perl array here because I couldn't find a suitable module, but it's far from ideal) containing n elements (where n is the number of places you want to rotate your layer by), and to iterate through each elements of the layer, and each time, push its value to the queue, and replace it with the first element of the queue.
This is what the sub rotate_matrix does.
- $i and $j represent the index of the current element of the layer. Their values goes between $target_layer and $max_i/$max_j; the sub next_ij takes care of computing the index of the next element of the layer.
- The interesting stuffs happen inside the while(1) loop (which is actually a do...while; but do while loops are a bit special in Perl): the value of the current element is pushed to the queue, and, if their are more than $n elements in the queue (meaning that we have already pushed $n elements, representing the fact the we want to rotate by $n), we replace its value by the first element of the queue. The continue block takes care of incrementing $i and $j, and stops the loop once we come back to the first element of the layer.
- Once this is loop is done, the first n elements of the layer haven't been updated (because of this next unless #queue > $n), so we need to take care of that: this is the while (#queue) loop).
Complexity-wise, there is no need to copy the array. The memory usage is O(n) (the size of the queue). Time-wise, you loop once over each element of the layer, and each time, you push and shift from you queue (using a better datastructure, like a linked list, this would be O(1), but with a Perl array, it's probably in the lines of O(n); but probably amortized to O(1)), and compute the next index (O(1)). So you end up with a time complexity, you end up with O(size of the layer to rotate).
A few caveats: if $n is greater than the number of elements in the layer, this won't work (but it can be fixed with a simple modulo). If you ask for a layer that isn't present in the matrix (like 4-th layer of a 2x2 matrix), weird stuffs will happen (but once again, it's easy to fix).
#!/usr/bin/perl
use strict;
use warnings;
use v5.14;
my #arr = ( [qw(0 1 2 3 4 5)],
[qw(a b c d e f)],
[qw(5 4 3 2 1 0)],
[qw(9 8 7 6 5 4)],
[qw(f e d c b a)],
[qw(4 5 6 7 8 9)] );
print_matrix(#arr);
rotate_matrix(\#arr, 2, 2);
say "";
print_matrix(#arr);
sub rotate_matrix {
my ($matrix, $target_layer, $n) = #_;
--$target_layer; # I prefer a 0-indexed value
# TODO: check that $target_layer < #$matrix/2
# TODO: do something like $n = $n % (num of elements in layer)
my #queue;
my ($i, $j) = ($target_layer, $target_layer);
my ($max_i, $max_j) = (#{$matrix->[0]}-$target_layer-1, #$matrix-$target_layer-1);
while (1) { # Actually a do...while loop (see 'continue' block)
push #queue, $matrix->[$i][$j];
next unless #queue > $n; # Waiting to reach n-th element
$matrix->[$i][$j] = shift #queue;
} continue {
($i, $j) = next_ij($target_layer,$max_i,$max_j,$i,$j);
# Stopping if we are back at the begining
last if $i == $target_layer && $j == $target_layer;
}
# Emptying queue
while (#queue) {
$matrix->[$i][$j] = shift #queue;
($i, $j) = next_ij($target_layer,$max_i,$max_j,$i,$j);
}
}
# Computes the index of the next element of the layer
sub next_ij {
my ($target_layer, $max_i, $max_j, $i, $j) = #_;
if ($j == $max_j) { # Last column
if ($i == $max_i) { # Last row (bottom right -> going left)
return ($i, $j-1);
} else { # Not last row (somewhere on the right col -> going down)
return ($i+1, $j);
}
} elsif ($j == $target_layer) { # First column
if ($i == $target_layer) { # First row (top left -> going right)
return ($i, $j+1);
} else { # Not top row (somewhere on the left col -> going up)
return ($i-1, $j);
}
} else { # Neither first nor last column
if ($i == $target_layer) { # First row (somewhere on the top row -> going right)
return ($i, $j+1);
} else { # Last row (somewhere on the last row -> going left)
return ($i, $j-1);
}
}
}
# Prints given matrix
sub print_matrix {
foreach my $row (#_) {
my $cnt=0;
foreach my $char (#$row) {
print $char; $cnt++;
if ( $cnt == scalar(#_) ) {
print "\n";
} else {
print " ";
}
}
}
}
The approach in a nutshell:
copy array
calculate list of members (row, column) for the ring
loop over list index
read from member[index] in original array
write to member[(index + shift) % ring length] in copy array
Right shift by N places: shift == N
Left shift by N places: shift == -N
#!/usr/bin/perl
use strict;
use warnings;
sub dump_array($$) {
my($label, $array) = #_;
print "${label}:\n";
foreach my $row (#{ $array }) {
print join(" ", #{ $row }), "\n";
}
print "\n";
}
sub copy_array($) {
my($array) = #_;
my #copy;
foreach my $row (#{ $array }) {
push(#copy, [ #{ $row } ]);
}
return(\#copy);
}
sub ring_index_to_row_col($$$$$) {
my($N, $ring, $total, $length, $index) = #_;
my($row, $col);
if ($index < $length) {
# top row, left to right
$row = 0;
$col = $index;
} elsif ($index < 2 * $length - 2) {
# top to bottom, right row
$row = $index - $length + 1;
$col = $N - 1 - 2 * $ring;
} elsif ($index < 3 * $length - 2) {
# bottom row, right to left
$row = $N - 1 - 2 * $ring;
#$col = $length - 1 - ($index - 2 * $length + 2);
$col = -$index + 3 * $length - 3;
} else {
# bottom to top, left row
#$row = $total - 1 - $index + 1;
$row = $total - $index;
$col = 0;
}
#print "${index}\t of ${total}\t-> ${row}, ${col}\n";
# shift $length x length array to offset ($ring, $ring)
return([$row + $ring, $col + $ring]);
}
sub ring_members($$) {
my($N, $ring) = #_;
my #list;
# #TODO: odd N?
#
# Examples for N == 6
# 0 -> 2*6 + 2*4 = 20
# 1 -> 2*4 + 2*2 = 12
# 2 -> 2*2 + 2*0 = 4
#
# Examples for N == 5
# 0 -> 2*5 + 2*3
# 1 -> 2*3 + 2*1
# 2 -> 1
#
# Examples for N == 4
# 0 -> 2*4 + 2*2 = 12
# 1 -> 2*2 + 2*0 = 4
my $length = $N - 2 * $ring;
my $members = 4 * $N - 8 * $ring - 4;
foreach my $index (0..$members-1) {
push(#list, ring_index_to_row_col($N, $ring, $members, $length, $index));
}
return(\#list);
}
sub rotate_array_ring(\#$$) {
my($source, $ring, $shift) = #_;
# Sanity check. #TODO is the check correct for odd N?
my $N = #{ $source };
die "ERROR: invalid ring '${ring}' for 2D array of size $N!\n"
unless $ring < ($N / 2);
my $copy = copy_array($source);
my $list = ring_members($N, $ring);
my $length = #{ $list };
foreach my $index (0..$length-1) {
my($row, $col) = #{ $list->[ $index ] };
my($s_row, $s_col) = #{ $list->[($index + $shift) % $length ] };
$copy->[$s_row]->[$s_col] = $source->[$row]->[$col];
}
return($copy);
}
my #source;
while (<DATA>) {
chomp;
push(#source, [ split(/\s+/, $_) ]);
}
dump_array('SOURCE', \#source);
dump_array('SHIFT 1 RING 0', rotate_array_ring(#source, 0, 1));
dump_array('SHIFT 1 RING 1', rotate_array_ring(#source, 1, 1));
dump_array('SHIFT 1 RING 2', rotate_array_ring(#source, 2, 1));
dump_array('SHIFT 2 RING 0', rotate_array_ring(#source, 0, 2));
dump_array('SHIFT 2 RING 1', rotate_array_ring(#source, 1, 2));
dump_array('SHIFT 2 RING 2', rotate_array_ring(#source, 2, 2));
exit 0;
__DATA__
0 1 2 3 4 5
a b c d e f
5 4 3 2 1 0
9 8 7 6 5 4
f e d c b a
4 5 6 7 8 9
Example output:
$ perl dummy.pl
SOURCE:
0 1 2 3 4 5
a b c d e f
5 4 3 2 1 0
9 8 7 6 5 4
f e d c b a
4 5 6 7 8 9
SHIFT 1 RING 0:
a 0 1 2 3 4
5 b c d e 5
9 4 3 2 1 f
f 8 7 6 5 0
4 e d c b 4
5 6 7 8 9 a
SHIFT 1 RING 1:
0 1 2 3 4 5
a 4 b c d f
5 8 3 2 e 0
9 e 7 6 1 4
f d c b 5 a
4 5 6 7 8 9
SHIFT 1 RING 2:
0 1 2 3 4 5
a b c d e f
5 4 7 3 1 0
9 8 6 2 5 4
f e d c b a
4 5 6 7 8 9
SHIFT 2 RING 0:
5 a 0 1 2 3
9 b c d e 4
f 4 3 2 1 5
4 8 7 6 5 f
5 e d c b 0
6 7 8 9 a 4
SHIFT 2 RING 1:
0 1 2 3 4 5
a 8 4 b c f
5 e 3 2 d 0
9 d 7 6 e 4
f c b 5 1 a
4 5 6 7 8 9
SHIFT 2 RING 2:
0 1 2 3 4 5
a b c d e f
5 4 6 7 1 0
9 8 2 3 5 4
f e d c b a
4 5 6 7 8 9
OPTIMIZATION: don't copy the whole array, but
use $list to copy out the ring members to a list and then
loop over that list to place them into the shifted position back into the array.
I.e. two loops instead of one:
my $list = ring_members($N, $ring);
my $length = #{ $list };
# in-place approach
my #members;
foreach my $index (0..$length-1) {
my($row, $col) = #{ $list->[ $index ] };
push(#members, $source->[$row]->[$col]);
}
foreach my $index (0..$length-1) {
my($row, $col) = #{ $list->[ ($index + $shift) % $length ] };
$source->[$row]->[$col] = $members[$index];
}
return($source);
OPTIMIZATION 2: if you have a sequence of operations that you need to apply to a set of same-sized 2D arrays, then you could pre-generate them and process them over-and-over using a function:
my $N = 6; # all arrays are of same size
my #rings = (
ring_members($N, 0),
ring_members($N, 1),
ring_members($N, 2),
);
my #operations = (
{ ring => $rings[0], shift => 1 },
{ ring => $rings[1], shift => -1 },
{ ring => $rings[2], shift => 2 },
# ...
)
# apply one operation in-place on array
sub apply_operation($$) {
my($array, $operation) = #_;
my $list = $operation->{ring};
my $shift = $operation->{shift};
my $length = #{ $list };
my #members;
foreach my $index (0..$length-1) {
my($row, $col) = #{ $list->[ $index ] };
push(#members, $array->[$row]->[$col]);
}
foreach my $index (0..$length-1) {
my($row, $col) = #{ $list->[ ($index + $shift) % $length ] };
$array->[$row]->[$col] = $members[$index];
}
}
# apply operation sequence in-place on array
sub apply_operations($$) {
my($array, $operations) = #_;
foreach my $operation (#{ $operations }) {
apply_operation($array, $operation);
}
}
apply_operations(\#array1, \#operations);
apply_operations(\#array2, \#operations);
apply_operations(\#array3, \#operations);
...
OPTIMIZATION 3: don't make a copy of the complete group in the in-place approach, but only of those members which would be overwritten by the roation. I.e. use 3 loops:
copy the first or last $shift members (depending on the rotation direction) to #cache array.
do a straight copy for the other $length - abs($shift) members.
copy #cache members to their rotated places.
OPTIMIZATION 4: don't use a 2D array (AoA) but a 1D array with $N * $N entries. I.e.
instead of ring_index_to_row_col() returning [$row, $col] you would have ring_index_to_array_index() returning $row * $N + $col.
array accesses ->[$row]->[$col] would be replaced with ->[$array_index].

How Can I read TCL file value by value

I have a file in Tcl, and I want to read each value alone as the array... I recognize my file like this
PUx(1) 1 2 3 4 5
PUx(2) 1 2 3 4 5
PUx(3) 1 2 3 4 5
PUx(4) 1 2 3 4 5
PUx(5) 1 2 3 4 5
So, I want to get, for example, the value of PUx(1) one by one and add it to the variable.
enter image description here
As Donal pointed out, it all depends on the actual format. But if the example data is representative, this might work for you:
set content {PUx(1) 1 2 3 4 5
PUx(2) 1 2 3 4 5
PUx(3) 1 2 3 4 5
PUx(4) 1 2 3 4 5
PUx(5) 1 2 3 4 5}
foreach line [split $content \n] {
set values [lassign $line varName]
set $varName $values
}
parray PUx
lassign assumes the line-wise data to represent a valid Tcl list. This might or might not be the case for you.
Update
You might want to re-organize your dataset, this would allow you to use a Tcl array idiom to access "rows" and "columns" of data in a straightforward manner, more or less:
set content {PUx(1,1) 1
PUx(1,2) 2
PUx(1,3) 3
PUx(1,4) 4
PUx(1,5) 5
PUx(2,1) 1
PUx(2,2) 2
PUx(2,3) 3
PUx(2,4) 4
PUx(2,5) 5}
foreach line [split $content \n] {
set values [lassign $line varName]
set $varName $values
}
parray PUx
# first column: *,1
foreach {k v} [array get PUx *,1] {
puts $v
}
# first row: 1,*
foreach {k v} [array get PUx 1,*] {
puts $v
}
Provided that your main concern is how to compute the sum over a list of elements, these are three options available to:
proc lsum1 {x} {
set r 0
foreach i $x {
incr r $i
}
return $r
}
proc lsum2 {x} {
expr [join $x " + "]
}
proc lsum3 {x} {
::tcl::mathop::+ {*}$x
}
set x {1 2 3 4 5}
lsum1 $x
lsum2 $x
lsum3 $x
lsum1 and lsum3 are preferable. lsum2 is the literate translation of what you describe as your "problem", at least in my reading. You may also want to check the Tcl wiki. It gives you some background on the details of lsum3.
This can be easily integrated with reading your data, as shown in my first answer:
lsum1 $PUx(1)
lsum3 $PUx(1)

Merging two file based on columns and sorting

I have two files, FILE1 and FILE2, that have a different number of
columns and some columns in common. In both files the first column is
a row identifier. I want to merge the two files (FILE1 and FILE2)
without changing the order of the columns, and where there is a missing
value input the value '5'.
For example FILE1 (first column is the row ID, A1 is the first row, A2
the second, ...):
A1 1 2 5 1
A2 0 2 1 1
A3 1 0 2 2
The column names for FILE1 is (these are specified in another file),
Affy1
Affy3
Affy4
Affy5
which is to say that the value in row A1, column Affy1 is 1
and the value in row A3, column Affy5 is 2
v~~~~~ Affy3
A1 1 2 5 1
A2 0 2 1 1
A3 1 0 2 2
^~~~ Affy1
Similarly for FILE2
B1 1 2 0
B2 0 1 1
B3 5 1 1
And its column names,
Affy1
Affy2
Affy3
Meaning that
v~~~~~ Affy2
B1 1 2 0
B2 0 1 1
B3 5 1 1
^~~~ Affy1
I want to merge and sort columns based on the column names and put a
'5' for missing values. so the merged result would be as follows:
A1 1 5 2 5 1
A2 0 5 2 1 1
A3 1 5 0 2 2
B1 1 2 0 5 5
B2 0 1 1 5 5
B3 5 1 1 5 5
And the columns:
Affy1
Affy2
Affy3
Affy4
Affy5
Which is to say,
v~~~~~~~ Affy2
A1 1 5 2 5 1
A2 0 5 2 1 1
A3 1 5 0 2 2
B1 1 2 0 5 5
B2 0 1 1 5 5
B3 5 1 1 5 5
^~~~ Affy1
In reality I have over 700K columns and over 2K rows in each file. Thanks in advance!
The difficult part is ordering the headers when some of them appear only in one file. The best way I know is to build a directed graph using the Graph module and sort the elements topologically
Once that's done it's simply a matter of assigning the values from each file to the correct columns and filling the blanks with 5s
I've incorporated the headers as the first line of each data file, so this program works with this data
file1.txt
ID Affy1 Affy3 Affy4 Affy5
A1 1 2 5 1
A2 0 2 1 1
A3 1 0 2 2
file2.txt
ID Affy1 Affy2 Affy3
B1 1 2 0
B2 0 1 1
B3 5 1 1
And here's the code
consolidate_columns.pl
use strict;
use warnings 'all';
use Graph::Directed;
my #files = qw/ file1.txt file2.txt /;
# Make an array of two file handles
#
my #fh = map {
open my $fh, '<', $_ or die qq{Unable to open "$_" for input: $!};
$fh;
} #files;
# Make an array of two lists of header names
#
my #file_heads = map { [ split ' ', <$_> ] } #fh;
# Use a directed grapoh to sort all of the header names so thet they're
# still in the order that they were at the top of both files
#
my #ordered_headers = do {
my $g = Graph::Directed->new;
for my $f ( 0, 1 ) {
my $file_heads = $file_heads[$f];
$g->add_edge($file_heads->[$_], $file_heads->[$_+1]) for 0 .. $#$file_heads-1;
}
$g->topological_sort;
};
# Form a hash converting header names to column indexes for output
#
my %ordered_headers = map { $ordered_headers[$_] => $_ } 0 .. $#ordered_headers;
# Print the header and the reformed records from each file. Use the hash to
# convert the header names into column indexes
#
print "#ordered_headers\n";
for my $i ( 0 .. $#fh ) {
my $fh = $fh[$i];
my #file_heads = #{ $file_heads[$i] };
my #splice = map { $ordered_headers{$_} } #file_heads;
while ( <$fh> ) {
next unless /\S/;
my #columns;
#columns[#splice] = split;
$_ //= 5 for #columns[0 .. $#ordered_headers];
print "#columns\n";
}
}
output
ID Affy1 Affy2 Affy3 Affy4 Affy5
A1 1 5 2 5 1
A2 0 5 2 1 1
A3 1 5 0 2 2
B1 1 2 0 5 5
B2 0 1 1 5 5
B3 5 1 1 5 5
For the fun of it -- HTH
#!/usr/bin/perl
use warnings;
use strict;
use constant {A => 1, B => 2, BOTH =>3};
# I don't read data from file
my #columns = qw(Affy1 Affy2 Affy3 Affy4 Affy5);
my #locations = (BOTH, B, BOTH, A, A);
my #contentA = (["A1", 1, 2, 5, 1],
["A2", 0, 2, 1, 1],
["A3", 1, 0, 2, 2]);
my #contentB = (["B1", 1, 2, 0],
["B2", 0, 1, 1],
["B3", 5, 1, 1]);
#I assume both files have the same amount of lines
my #ares = ();
my #bres = ();
for(my $i = 0; $i < #contentA; ++$i){
# this uses a lot of memory whith huge amounts of data
# maybe you write this in two temp result files and cat them
# together at the end
# another alternative would be to iterate first over
# file A and then over file A
my #row_a = ();
my #row_b = ();
push #row_a, shift #{$contentA[$i]}; #id
push #row_b, shift #{$contentB[$i]}; #id
foreach my $loc (#locations){
if(A == $loc){
push #row_a, shift #{$contentA[$i]};
push #row_b, 5;
}
if(B == $loc){
push #row_a, 5;
push #row_b, shift #{$contentB[$i]};
}
if(BOTH == $loc){
push #row_a, shift #{$contentA[$i]};
push #row_b, shift #{$contentB[$i]};
}
}
push #ares, \#row_a;
push #bres, \#row_b;
}
foreach my $ar(#ares){
print join " ", #{$ar};
print "\n";
}
foreach my $br(#bres){
print join " ", #{$br};
print "\n";
}
print join("\n", #columns);
print "\n";

Fetching indices of a text file from another text file

The title may not be so descriptive. Let me explain:
I have a file (Say File 1) having some numbers [delimited by a space]. see here,
1 2 3 4 5
1 2 8 4 5 6 7
1 9 3 4 5 6 7 8
..... n lines (length of each line varies).
I have another file (Say File 2) having some numbers [delimited by a tab]. see here,
1 1 1 1 1 1 0 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 1 1 1 1 1
1 1 1 1 1 1 0 1 1 1 1 1
..... m lines (length of each line fixed).
I want sum of 1 2 3 4 5 th (file 1 Line 1) position of file 2, line 1
I want sum of 1 2 3 4 5 6 7 th (file 1 Line 2) position of file 2, line 1 and so on.
I want linewise sum of file 2 with positions all lines in file 1
It will look like:
5 6 6 …n columns (File 1)
1 8 3
9 8 4
… m rows (File 2)
I did this by the following code:
open( FH1, "File1.txt" );
#index = <FH1>;
open( FH2, "File2.txt" );
#matrix = <FH2>;
open( OUTPUT, ">sum.txt" );
foreach $xx (#matrix) {
#k1 = split( /\t/, "$xx" );
foreach $yy (#index) {
#k2 = split( / /, "$yy" );
$ssum = 0;
foreach $zz (#k2) {
$zz1 = $zz - 1;
if ( $k1[$zz1] == 1 ) {
$ssum++;
}
}
printf OUTPUT"$ssum\t";
$ssum = 0;
}
print OUTPUT"\n";
}
close FH1;
close FH2;
close OUTPUT;
It works absolutely fine except that, the time time requirement is enormous for large files. (e.g. 1000 lines File 1 X 25000 lines File 2 : The time is 8 minutes .
My data may exceed 4 times this example. And it's unacceptable for my users.
How to accomplish this, consuming much lesser time. or by Any other concept.
Always include use strict; and use warnings; in every PERL script.
You can simplify your script by not processing the first file multiple times. Also, you coding style is very outdated. You use with some lessons from Modern Perl Book by chromatic.
The following is your script simplified to take advantage of more modern style and techniques. Note, that it currently loads the file data from inside the script instead of external sources:
use strict;
use warnings;
use autodie;
use List::Util qw(sum);
my #indexes = do {
#open my $fh, '<', "File1.txt";
open my $fh, '<', \ "1 2 3 4 5\n1 2 8 4 5 6 7\n1 9 3 4 5 6 7 8\n";
map { [map {$_ - 1} split ' '] } <$fh>
};
#open my $infh, '<', "File2.txt";
my $infh = \*DATA;
#open my $outfh, '>', "sum.txt";
my $outfh = \*STDOUT;
while (<$infh>) {
my #vals = split ' ';
print $outfh join(' ', map {sum(#vals[#$_])} #indexes), "\n";
}
__DATA__
1 1 1 1 1 1 0 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 1 1 1 1 1
1 1 1 1 1 1 0 1 1 1 1 1
Outputs:
5 6 7
5 7 8
5 6 7
5 6 7

PowerShell Replace CRLF in certain scenarios

I'm new to PowerShell and wish to replace CRLF in some scenarios within a textfile.
An Example textfile will be:
Begin 1 2 3
End 1 2 3
List asd asd
Begin 1 2 3
End 1 2 3
Begin 1 2 3
End 1 2 3
Sometest asd asd
Begin 1 2 3
Where a line isn't starting with Begin or End, I wish to append that line onto the previous one.
So the desired outcome would be:
Begin 1 2 3
End 1 2 3 List asd asd
Begin 1 2 3
End 1 2 3
Begin 1 2 3
End 1 2 3 Sometest asd asd
Begin 1 2 3
The file is Tab Seperated. So after Begin and End, is a TAB.
I tried the below, just to get rid of all the CRLF's, which doesn't work:
$content = Get-Content c:\test.txt
$content -replace "'r'n","" | Set-Content c:\test2.txt
I've read the MSDN on PowerShell and can replace text on different lines, just not over multiple lines like this :(
I'm at home testing on Windows 7, but this is for work and will be on Vista.
# read the file
$content = Get-Content file.txt
# Create a new variable (array) to hold the new content
$newContent = #()
# loop over the file content
for($i=0; $i -lt $content.count; $i++)
{
# if the current line doesn't begin with 'begin' or 'end'
# append it to the last line םכ the new content variable
if($content[$i] -notmatch '^(begin|end)')
{
$newContent[-1] = $content[$i-1]+' '+$content[$i]
}
else
{
$newContent += $content[$i]
}
}
$newContent
What do you think about this one line ?
gc "beginend.txt" | % {}{if(($_ -match "^End")-or($_ -match "^Begin")){write-host "`n$_ " -nonewline}else{write-host $_ -nonewline}}{"`n"}
Begin 1 2 3
End 1 2 3 List asd asd
Begin 1 2 3
End 1 2 3
Begin 1 2 3
End 1 2 3 Sometest asd asd
Begin 1 2 3
$data = gc "beginend.txt"
$start = ""
foreach($line in $data) {
if($line -match "^(Begin|End)") {
if($start -ne "") {
write-output $start
}
$start = $line
} else {
$start = $start + " " + $line
}
}
# This last part is a bit of a hack. It picks up the last line
# if the last line begins with Begin or End. Otherwise, the loop
# above would skip the last line. Probably a more elegant way to
# do it :-)
if($data[-1] -match "^(Begin|End)") {
write-output $data[-1]
}

Resources