I have tcl arrays having the names as 0,1,2,.. and the values contain meaningful information.
I want to use a single command like [array names $my_array] to get the values of the array.
Right now I only see this option,(lengthy but gets job done)
for {set index 0} {$index<[array size my_array]} {incr index} {
lappend my_list_values $my_array($index)
}
You can use array get to fetch all the elements of the array, and a foreach loop that consumes multiple elements of its list in each iteration (Plus sorting by key to get a reproducible order):
% set foo(0) a
a
% set foo(1) b
b
% set foo(2) c
c
% foreach {key val} [lsort -integer -stride 2 [array get foo]] { lappend values $val }
% puts $values
a b c
%
Or with array names:
foreach key [lsort -integer [array names foo]] { lappend values $foo($key) }
Here's a proc implementing a foreach functionality for arrays. In Tcl 8.7 this will be builtin: array for
# A note on performance: we're not saving any time with this approach.
# This is essentially `foreach name [array names ary] {...}
# We are saving memory: iterating over the names versus extracting
# them all at the beginning.
#
proc array_foreach {vars arrayName body} {
if {[llength $vars] != 2} {
error {array foreach: "vars" must be a 2 element list}
}
lassign $vars keyVar valueVar
# Using the complicated `upvar 1 $arrayName $arrayName` so that any
# error messages propagate up with the user's array name
upvar 1 $arrayName $arrayName \
$keyVar key \
$valueVar value
set sid [array startsearch $arrayName]
# If the array is modified while a search is ongoing, the searchID will
# be invalidated: wrap the commands that use $sid in a try block.
try {
while {[array anymore $arrayName $sid]} {
set key [array nextelement $arrayName $sid]
set value [set "${arrayName}($key)"]
uplevel 1 $body
}
} trap {TCL LOOKUP ARRAYSEARCH} {"" e} {
puts stderr [list $e]
dict set e -errorinfo "detected attempt to add/delete array keys while iterating"
return -options $e
} finally {
array donesearch $arrayName $sid
}
return
}
You can add this to the array ensemble:
set map [namespace ensemble configure array -map]
dict set map foreach ::array_foreach
namespace ensemble configure array -map $map
then
% array set a {foo bar baz qux}
% array foreach {key value} a {puts "key=$key, value=$value"}
key=foo, value=bar
key=baz, value=qux
I have a list of files and a separate list of sizes of those files using "file size <file_name>".
I am required to sort the files in ascending order based on the size and then feed it further for processing.
Can someone provide a step by step process I could follow?
This is what I have done so far
set direc "<Any direcotry to look files at>"
set folderFiles [glob -directory $direc -nocomplain -type f *.xml]
set fileSizes []
puts "Files to be processed are:"
puts "$folderFiles"
puts "Sizes of files in this order are:"
foreach tempFile $folderFiles {
lappend fileSizes [file size $tempFile]
}
puts $fileSizes
set fileDict [dict create [lindex $folderFiles 0] [lindex $fileSizes 0]]
for {set i 1} {$i < [llength $folderFiles]} {incr i} {
dict lappend fileDict [lindex $folderFiles $i] [lindex $fileSizes $i]
}
puts $fileDict
So, this gives me a dictionary where keys -> files and values -> file sizes. I just need to sort this dictionary based on values which are file sizes.
The first thing you need to do is to get the list of filenames and their sizes. You can keep the sizes separately.
set filenames [glob -type f *.foo]; # Or whatever
set sizes [lmap f $filenames {file size $f}]
Then we sort the sizes, but get the indices of the sort back rather than the sorted list.
set indices [lsort -indices -integer $sizes]
Now, we use those indices to construct the sorted filenames:
set filenames [lmap idx $indices {lindex $filenames $idx}]
We can combine some of these things into a helper procedure:
proc SortFilesBySize {filenames} {
set sizes [lmap f $filenames {file size $f}]
return [lmap idx [lsort -indices -integer $sizes] {lindex $filenames $idx}]
}
set filenames [glob -type f *.foo]; # Or whatever
puts [join [SortFilesBySize $filenames] "\n"]
One way:
#!/usr/bin/env tclsh
proc zip {list1 list2} {
lmap a $list1 b $list2 { list $a $b }
}
proc heads {pairs} {
lmap pair $pairs { lindex $pair 0 }
}
proc sort_by_size {names sizes} {
heads [lsort -integer -increasing -index 1 [zip $names $sizes]]
}
set names {a.txt b.txt c.txt}
set sizes {3 2 1}
puts [sort_by_size $names $sizes]
Combines the names and sizes into a list of pairs of filename and size, sorts based on size, and then returns just the reordered filenames. Essentially a tcl version of perl's classic Schwartzian Transform idiom.
#!/usr/bin/expect -f
set myarr1(chicken) animal
set myarr1(cows) animal
set myarr1(tiger) animal
set myarr1(horse) animal
set myarr2(carrot) vegetable
set myarr2(tomato) vegetable
set myarr2(potato) vegetable
set myarr2(pea) vegetable
set arr_list { myarr1 myarr2 }
foreach key [array names [lindex $arr_list 0]] {
puts "${key}=$[lindex $arr_list 0]($key)"
}
foreach key [array names [lindex $arr_list 1]] {
puts "${key}=$[lindex $arr_list 1]($key)"
}
Output obtained:
cows=$myarr1(cows)
horse=$myarr1(horse)
chicken=$myarr1(chicken)
tiger=$myarr1(tiger)
tomato=$myarr2(tomato)
pea=$myarr2(pea)
potato=$myarr2(potato)
carrot=$myarr2(carrot)
Required output:
cows=animal
horse=animal
chicken=animal
tiger=animal
tomato=vegetable
pea=vegetable
potato=vegetable
carrot=vegetable
I am able to get the required output if I use the following in foreach loop:
foreach key [array names myarr1] {
puts "${key}=$myarr1($key)"
}
foreach key [array names myarr2] {
puts "${key}=$myarr2($key)"
}
I am trying to create a list of array names and then loop through that list of array names and print it. If there is a better way to approach this problem, I am all ears. Thanks for the assist !
You really ought to use nested foreachs for that.
foreach arr $arr_list {
foreach key [array names $arr] {
puts "${key}=$$arr($key)"
}
}
Except that doesn't work! Why? It's easy on one level: the syntax for $ doesn't support such complexity; it really only supports a (very useful) subset of legal variable names and can't do complicated substitutions (array element names support more diverse options). We need to rewrite to use the single-argument set form as a first step:
foreach arr $arr_list {
foreach key [array names $arr] {
puts "${key}=[set [set arr]($key)]"
}
}
That works, but isn't very elegant (or fast, for that matter). It's actually better to use upvar 0 to make a local alias to the array that you're processing; variable alias from a to whatever was talked about with $arr will let us shorten things elsewhere, and it's pretty elegant in practice:
foreach arr $arr_list {
upvar 0 $arr a
foreach key [array names a] {
puts "$key=$a($key)"
}
}
You can also do things like sorting the list of element names (array names does not guarantee to return things in any particular order), putting spacing in between each of the arrays that you print out, etc. But that's the core of how to improve things. You can also use array get instead of array names and a multi-variable foreach, but then sorting the keys is more awkward (well, before Tcl 8.6's lsort gained the -stride option).
If you just want to print them, use the bundled parray proc:
foreach arr $arr_list {
parray $arr
}
which outputs
myarr1(chicken) = animal
myarr1(cows) = animal
myarr1(horse) = animal
myarr1(tiger) = animal
myarr2(carrot) = vegetable
myarr2(pea) = vegetable
myarr2(potato) = vegetable
myarr2(tomato) = vegetable
You can use the foreach command to traverse both the keys and the values in the array.
array set arr {a 1 b 2 c 3}
foreach {k v} [array get arr] {
puts $k=$v
}
References: foreach
You're on the right track with the foreach command, but to loop over the list of arrays you need to nest it:
foreach arr $arr_list {
foreach {key val} [array get arr] {
puts "$key = $val"
}
}
Also, look into the parray command, and see if that style of printing works for what you need. (I'd link it, but the site appears to be down at the moment. I'll try to remember to edit this later. Check out the command details at http://www.tcl.tk/man/tcl8.6/)
What you probably want is:
foreach arr $arr_list {
foreach key [array names $arr] {
puts "$key=[set ${arr}($key)]"
}
}
That said, you’re using arrays in a non-standard way that possibly is suboptimal. Have you looked into dictionaries? (Unfortunately, discussing how your data should be structured isn’t a Stackoverflow topic.)
Here is an excerpt from a data file, my starting point:
Marker Distance_1 Distance_2 ID
.
.
.
30 13387412 34.80391242 seq-SN_FIRST
31 13387444 34.80391444 seq-SN_Second
31.1
31.2
32 13387555 39.80391 seq-SN_Third
.
.
.
This is a tab-delimited file of multiple rows of four elements each. First row is a header. After that, numerous rows of data. The vertical dots are not actually in the real file, but they are here just to represent that data similar to the actual rows shown occur before and after the example fo rows explicitly shown.
Some of the data rows are "full", that is, all four cell entries contain something. Other rows are "blank", with only a first actual entry but followed by 3 tab delimited single spaces. Those white spaces in the blank rows need to be "filled in." The filling in will be done by linear interpolation, using the corresponding cell entries of the immediately preceding and immediately succeeding rows. For example, missing Distance_1 values, in column 2, will be interpolated using the value 13387444 of the preceding row and the value 13387555 of the succeeding row. Similarly so for the column 3 values. Column 4 values are just ignored here.
The first goal of the script is to identify the blocks of data that require filling and their flanking "full" lines. Blank lines will contain 3 tabbed single spaces and will be ID'd that way. Once found, the consecutive sets of blank lines plus flanking full lines are sent to subroutine for interpolation.
#!/usr/bin/perl
use strict;
use warnings;
die "usage: [ map positions file post SAS ]\n\n" unless #ARGV == 1;
my #file = ();
while (my $line = <$ARGV[0]>) {
chomp $line;
push(#file, $line);
}
my #consecutive_lines = (); # array collects a current set of consecutive lines requiring linear interpolation
my #filled = (); # my final array, hopefully correctly filled in
#####
# search for consecutive set of lines in #file
#####
for (my $i = 0; $i < $#file; $i++) { # $#file returns the index of the last element in #file
if ($file[$i] !~ /(?:\t\s){3}/) { # if not a blank line
# but a "full line"
push(#filled, $file[$i]); # push the header and full lines, until...
}
elsif ($file[$i] =~ /(?:\t\s){3}/) { # ...a blank line is found
push (#consecutive_lines, $file[$i - 1]); # push preceding full line into array
while ($file[$i] =~ /(?:\t\s){3}/ and $i < $#file) { # keep pushing lines, so long as they are blank
# or end of file
push(#consecutive_lines, $file[$i++]);
}
push(#consecutive_lines, $file[$i]) ; # else we reach next full line, so push it into array
my #fillme = linearInterpolation(#consecutive_lines); # send set of lines out for filling
push(#filled, #fillme); # push filled in set of lines into the final array
#consecutive_lines = (); # reset or undef array #consecutive_lines for next cycle
} # end of elsif
} # end of for loop
Thanks to user #Kenosis for lots help with the above, which I have since modified (hopefully not mangled).
Next is the linear interpolation. It is here where I am trying to link the first phase of the script to the second phase. And it is not working well so far.
My goal is to hand off the array #incoming to the subroutine. This array is then split up, so that the actual cell entries are "visible" and can be indexed by the array, and so called upon. I have been trying to figure out how to do this for the column 2 values for Distance_1 first. I feel like this script gets close and it begins to stray at the point just after the interpolated values are calculated.
#####
# subroutine linear interpolation
#####
sub linearInterpolation {
my #incoming = #_; # array of consecutive set of lines
my #splitup; # declare new array, will be a "split up" version of #incoming
my ($A, $B, $C, $D, $E); # variables for linear interpolation
my #fillme; # declaring the "emtpy" array to be filled in
my #dist_1_fills; # array of interpolated values for dist_1
for (my $i = 0;
$i < scalar #incoming; $i++) # loop to split up lines of #incoming
{ # into indexed cell entries
chomp $incoming[$i]; # and make new array of them
my #entries = split('\t', $incoming[$i]);
push(#splitup, #entries);
}
$A = $splitup[1]; # cell entry in column 2 of preceding full line
$B = $splitup[-3]; # cell entry in column 2 of succeeding full line
$C = $splitup[2]; # cell entry in column 3 of preceding full line
$D = $splitup[-2]; # cell entry in column 3 of succeeding full line
$E = scalar #incoming - 1; # equals number of lines in the set minus 1
for (my $i = 1; $i < $E; $i++) { # need to start finding appropriate
# number interpolated values, given number of
my #dist_1_fills =
interpvalues($A, $B, $E, $i); # of lines in consecutive set of lines
for ($i = 0; $i < scalar #splitup; $i += 4) {
push(#fillme, $splitup[$i], $dist_1_fills[$i], "dist_2_fills", "--");
# fourth column values will be ignored or filled with --.
# "dist_2_fills" just occupying it's proper spot until I can figure out distance 1 fills
}
}
}
#########
sub interpvalues { # subroutine to find interpolated values
my ($A, $B, $E, $i) = #_;
my $dist_1_answers = (($B - $A) / ($E)) * $i + $A;
return $dist_1_answers;
}
The code gets confused in the second part that deals with finding the interpolated values and sending them back to the first part of the code to finally fill in the data set. I think specifically my biggest (though probably not my only) problem is trying to fill in the blank lines with the proper values after they have been calculated in the second subroutine.
Any hints and clues are greatly appreciated!
This program will do what you need. It expects the inout file name as a parameter on the command line.
use strict;
use warnings;
my #saved;
my #needed;
while (<>) {
chomp;
my #fields = split /\t/;
# Pass hrough headers and junk
unless ($fields[0] and $fields[0] =~ /\d/) {
print "$_\n";
next;
}
# Save x-value for records without a y-value
if ($fields[1] !~ /\d/) {
push #needed, $fields[0];
next;
}
# We have a filled-out row. Calculate any intermediate missing ones
if (#needed) {
if ($saved[0] == $fields[0]) {
die sprintf qq(Duplicate marker values %.1f at line %d of "%s"\n), $saved[0], $., $ARGV;
}
my ($a1, $b1) = solve_linear(#saved[0,1], #fields[0,1]);
my ($a2, $b2) = solve_linear(#saved[0,2], #fields[0,2]);
while (#needed) {
my $x = shift #needed;
my $y1 = $a1 * $x + $b1;
my $y2 = $a2 * $x + $b2;
print join("\t", $x, $y1, $y2), "\n";
}
}
print "$_\n";
#saved = #fields;
}
sub solve_linear {
my ($x0, $y0, $x1, $y1) = #_;
my ($dx, $dy) = ($x1 - $x0, $y1 - $y0);
my $aa = $dy / $dx;
my $bb = ($y0 * $dx - $x0 * $dy) / $dx;
return ($aa, $bb);
}
output
Marker Distance_1 Distance_2 ID
.
.
.
30 13387412 34.80391242 seq-SN_FIRST
31 13387444 34.80391444 seq-SN_Second
31.1 13387455.1 35.303913996 --
31.2 13387466.2 35.803913552 --
32 13387555 39.80391 seq-SN_Third
.
.
.
Tool completed successfully
I modified the code to this, so that the linear interpolation is based not on the values in the first column but rather on the values in the second and third columns. Thanks especially to users #Kenosis and #Borodin. I have accepted Kenosis' answer to a previous question, and I have accepted Borodin's here, even though I post this revision in the "answer your own question" section. Is posting a revision here acceptable? I skimmed the FAQ about this but haven't found anything relevant yet.
#!/usr/bin/perl
use strict; use warnings;
my #saved;
my #needed;
while (<>) {
chomp;
my #fields = split /\t/;
# Does the current line actually exist AND does it contain one or more digits.
unless ($fields[0] and $fields[0] =~ /\d/) {
# If no, this is the header, so print it. If yes, advance.
print "$_\n";
#after printing header, go back to <> and read in next line.
next;
}
# Is the second cell of the current line devoid of digits?
if ($fields[1] !~ /\d/) {
# If no, advance. If yes, remember $field[0], the Marker.
push #needed, $fields[0];
# After pushing, go back to <> and read in next line.
next;
}
# If we are here, we must have a filled-out row.
# Does #needed have any values? If no, advance. If yes,
if (#needed) {
if ($saved[0] == $fields[0]) {
die sprintf qq(Duplicate marker values %.1f at line %d of "%s"\n), $saved[0], $., $ARGV;
}
# Else send preceding dist_1 value, succeeding dist_1 value,
# preceding dist_2 value, succeeding dist_2 value,
# and number of emtpy lines to subroutine.
my ($dist_1_interval, $dist_2_interval) = interval_sizes($saved[1], $fields[1], $saved[2], $fields[2], scalar #needed);
# Current size of #needed is saved as $size and is used to help with iteration.
# So long as #needed contains values...
my $size = scalar #needed;
while (#needed) {
# ...remove left-most Marker value from array #needed.
my $x = shift #needed;
# Interpolated values for dist_1 and dist_2 are
# (respective interval size x iteration of while loop) + preceding values.
my $new_dist_1 = ($dist_1_interval * (1 + ($size - (scalar #needed + 1)))) + $saved[1];
my $new_dist_2 = ($dist_2_interval * (1 + ($size - (scalar #needed + 1)))) + $saved[2];
print join("\t", $x, $new_dist_1, $new_dist_2, "--"), "\n";
}
}
# We are here since current line is already a filled-in row.
print "$_\n";
# Print this row and assign it to #saved. Return to <>.
#saved = #fields;
}
sub interval_sizes {
# $A = preceding dist_1, $B = succeeding dist_1,
# $C = preceding dist_2, $D = succeeding dist_2,
# $E = number of needed distances.
my ($A, $B, $C, $D, $E) = #_;
# I need an interval size for dist_1 based on difference between $B and $A.
my $dist_1_interval = ($B - $A)/($E + 1);
# I need an interval size for dist_2 based on difference between $D and $C.
my $dist_2_interval = ($D - $C)/($E + 1);
return ($dist_1_interval, $dist_2_interval);
}
Let's assume that thefilevalue_$thefile is an array that contains lists
foreach element [array names thefilevalue_$thefile] {
puts "[lindex $thefilevalue_[subst $thefile]($element) 0]"
}
but it returns :
can't read "thefilevalue_": no such variable
i am in tcl 8.4 and i con't upgrade it.
how can i fix it ?
Thanks
Use set and escape the parentheses, e.g.
array set thefilevalue_test {reds {orange red purple} blues {green blue purple}}
set thefile test
foreach element [array names thefilevalue_$thefile] {
puts [lindex [set thefilevalue_$thefile\($element\)] 0]
}
This outputs for me (Tcl 8.0.5, and I can't upgrade either):
orange
green