I have a 229 residue protein and I need to measure from the center of mass of residue 1-12 (individually) to every other atom, residue 13 onwards, I also need this for each frame. so far I have this
set pro [atomselect top "resid 1 and not water and not ion"]
set atom [atomselect top "index 207"]
set nf [molinfo top get numframes]
set outfile [open test207.dat w]
for {set i 0} {$i < $nf} {incr i} {
puts "frame $i of $nf"
$pro frame $i
$atom frame $i
set com1 [measure center $pro weight mass]
set com2 [measure center $atom weight mass]
set distance [veclength [vecsub $com1 $com2]]
puts $outfile "$i $distance"
}
this is working to the extent that it's measure the distance between the first atom of residue 13 to the com of residue 1 for all frames, but I'm unsure as to how to put a second loop that will loop for every atom, instead of running the script thousands of times (changing the atom number each time) resulting in thousands of files.
Is there a way to loop for each atom and for each frame in the same script?
Is this the kind of thing you are trying to do?
for {set resid 1} {$resid < 10} {incr resid} {
set atomkey [format "resid %d and not water and not ion" $resid]
puts $atomkey
set filename [format "test%d.dat" $resid]
set outfile [open $filename w]
# Do stuff
close $outfile
}
This results in atomkey and filename being built with the value of resid embeded in it. So this type of technique will allow you to create your key strings and file names. If you can't generate the values using a for loop then you can also look at the foreach loop:
foreach resid [list 1 2 4 6 7 99 12] {
set atomkey [format "resid %d and not water and not ion" $resid]
puts $atomkey
}
You can build the compound values just using set, but I like to use format as I'm a C programmer at heart.
set atomkey "resid $resid and.."
Normally this is how you put two loops...
for (set j 0} {$j < $..} {incr j} {
for {set i 0} {$i < $nf} {incr i} {
}; # inner loop ends
}; # outer loop ends
Related
I have some information in two large files.
One of them(file1.txt, has ~ 4 million lines) contains all object names(which are unique) and types.
And the other(file2.txt, has ~ 2 million lines) some object names(they can be duplicated) and some values assigned to them.
So, I have something like below in file1.txt:
objName1 objType1
objName2 objType2
objName3 objType3
...
And in file2.txt I have:
objName3 val3_1
objName3 val3_2
objName4 val4
...
For the all objects in file2.txt I need to output object names, their types and values assigned to them in a single file like below:
objType3 val3_1 "objName3"
objType3 val3_2 "objName3"
objType4 val4 "objName4"
...
Previously object names in file2.txt supposed to be unique, so I've implemented some solution, where I'm reading all the data from both files, saving them to a Tcl arrays, and then iterating over larger array and checking whether object with the same name exists in a smaller array, and if so, writing my needed information to a separate file. But this runs too long (> 10 hours and hasn't completed yet).
How can I improve my solution, or is there another way to do this?
EDIT:
Actually I don't have file1.txt, I'm finding that data by some procedure and writing it into Tcl array. I'm running some procedure to get object types and save them to a Tcl array, then, I'm reading file2.txt and saving data to a Tcl array, then I'm iterating over items in the first array, and if object name match some object in second(object values) array, I'm writing info to output file and erasing that element from the second array. Here is a piece of code that I'm running:
set outFileName "output.txt"
if [catch {open $outFileName "w"} fid ] {
puts "ERROR: Failed to open file '$outFileName', no write permission"
exit 1
}
# get object types
set TIME_start [clock clicks -milliseconds]
array set objTypeMap [list]
# here is some proc that fills up objTypeMap
set TIME_taken [expr [clock clicks -milliseconds] - $TIME_start]
puts "Info: Object types are found. Elapsed time $TIME_taken"
# read file2.txt
set TIME_start [clock clicks -milliseconds]
set file2 [lindex $argv 5]
if [catch { set fp [open $file2 r] } errMsg] {
puts "ERROR: Failed to open file '$file2' for reading"
exit 1
}
set objValData [read $fp]
close $fp
# tcl list containing lines of file2.txt
set objValData [split $objValData "\n"]
# remove last empty line
set objValData [lreplace $objValData end end]
array set objValMap [list]
foreach item $objValData {
set objName [string range $item 0 [expr {[string first " " $item] - 1}] ]
set objValue [string range $item [expr {[string first " " $item] + 1}] end ]
set objValMap($instName) $objValue
}
# clear objValData
unset objValData
set TIME_taken [expr [clock clicks -milliseconds] - $TIME_start]
puts "Info: Object value data is read and processed. Elapsed time $TIME_taken"
# write to file
set TIME_start [clock clicks -milliseconds]
foreach { objName objType } [array get objTypeMap] {
if { [array size objValMap] eq 0 } {
break
}
if { [info exists objValMap($objName)] } {
set objValue $objValMap($objName)
puts $fid "$objType $objValue \"$objName\""
unset objValMap($objName)
}
}
if { [array size objValMap] neq 0 } {
foreach { objName objVal } [array get objValMap] {
puts "WARNING: Can not find object $objName type, skipped..."
}
}
close $fid
set TIME_taken [expr [clock clicks -milliseconds] - $TIME_start]
puts "Info: Output is cretaed. Elapsed time $TIME_taken"
Seems for the last step (writing to a file) there are ~8 * 10^12 iterations to do, and it's not realistic to complete in a reasonable time, because I've tried to do 8 * 10^12 iterations in a for loop and just print the iteration index, and ~850*10^6 iterations took ~30 minutes (so, the whole loop will finish in ~11hours).
So, there should be another solution.
EDIT:
Seems the reason was some unsuccessful hashing for file2.txt map, as I've tried to shuffle lines in file2.txt and got results in about 3 minutes.
Write the data to file1, and let an external tool do all the hard work (it's bound to be much more optimized for the task than home-spun Tcl code)
exec bash -c {join -o 0,1.2,2.2 <(sort file1.txt) <(sort file2.txt)} > result.txt
So… file1.txt is describing a mapping and file2.txt is the list of things to process and annotate? The right thing is to load the mapping into an array or dictionary where the key is the part that you will look things up by, and to then go through the other file line-by-line. That keeps the amount of data in memory down, but it's worth holding the whole mapping that way anyway.
# We're doing many iterations, so worth doing proper bytecode compilation
apply {{filename1 filename2 filenameOut} {
# Load the mapping; uses memory proportional to the file size
set f [open $filename1]
while {[gets $f line] >= 0} {
regexp {^(\S+)\s+(.*)} $line -> name type
set types($name) $type
}
close $f
# Now do the streaming transform; uses a small fixed amount of memory
set fin [open $filename2]
set fout [open $filenameOut "w"]
while {[gets $fin line] >= 0} {
# Assume that the mapping is probably total; if a line fails we're print it as
# it was before. You might have a different preferred strategy here.
catch {
regexp {^(\S+)\s+(.*)} $line -> name info
set line [format "%s %s \"%s\"" $types($name) $info $name]
}
puts $fout $line
}
close $fin
close $fout
# All memory will be collected at this point
}} "file1.txt" "file2.txt" "fileProcessed.txt"
Now, if the mapping is very large, so much that it doesn't fit in memory, then you might be better doing it via building file indices and stuff like that, but frankly then you're actually better off getting familiar with SQLite or some other database.
A pure-Tcl variant of Glenn Jackman's code would be
package require fileutil
package require struct::list
set data1 [lsort -index 0 [split [string trim [fileutil::cat file1.txt]] \n]]
set data2 [lsort -index 0 [split [string trim [fileutil::cat file2.txt]] \n]]
fileutil::writeFile result.txt [struct::list dbJoin -full 0 $data1 0 $data2]
But in this case each row will have four columns, not three: the two columns from file1.txt and the two columns from file2.txt. If that is a problem, reducing the number of columns to three is trivial.
The file join in the example is also full, i.e. all rows from both files will occur in the result, padded by empty strings if the other file has no corresponding data. To solve the OP's problem, an inner join is probably better (only rows that correspond are retained).
fileutil::cat reads the contents of a file, string trim removes leading and trailing whitespace from the contents, to avoid empty lines in the beginning or end, split ... \n creates a list where every row becomes an item, lsort -index 0 sorts that list based on the first word in every item.
The code is verified to work with Tcl 8.6 and fileutil 1.14.8. The fileutil package is a part of the Tcllib companion library for Tcl: the package can be individually upgraded to the current version by downloading the Tcl source and copying it to the relevant location in the Tcl installation's lib tree (C:\Tcl\lib\teapot\package\tcl\teapot\tcl8\8.2 in my case).
Quick-and-dirty install: download fileutil.tcl from here (use the Download button) and copy the file to where your other sources are. In your source code, call source fileutil.tcl and then package require fileutil. (There may still be compatibility problems with Tcl or with e.g. the cmdline package. Reading the source may suggest workarounds for such.) Remember to check the license conditions for conflicts.
Documentation: fileutil package, lsort, package, set, split, string, struct::list package
Maybe this is pretty stupid, but I really can't find a soulution. I created two variables and want to transform them into lists.
This commands are tool specific, but they work the way I want:
redirect max_transition {report_constraint -view $pargs(-scenario) -drv_violation_type {max_transition} -all_violators} -variable
redirect max_capacitance {report_constraint -view $pargs(-scenario) -drv_violation_type {max_capacitance} -all_violators} -variable
Now I want to create tcl lists out of them. I could use a loop, because the data has the same structure.
set reports {$max_transition $max_capacitance}
set report_length [llength $reports]
for {set i 0} {$i < $report_length} {incr i} {
set tns_value 0
set max_wns 0
set vios 0
set report [lindex $reports $i]
puts $report
# remove all uneccessary white spaces
set no_space [regexp -all -inline {\S+} $report]
# insert a new line for every path
set insert_lines [string map {" U_" \nU_} $no_space]
# create list out of result reports
set report_list [split $insert_lines "\n"]
if {[llength $report_list] > 1} {
for {set i 1} {$i < [llength $report_list]} {incr i} {
# get value of violation
set slack [lindex [split [lindex $report_list $i] " "] 3]
set tns_value [expr $tns_value + $slack]
if {$vios == 0} {set max_wns $slack}
incr vios
}
}
# write out values
puts "$pargs(-scenario), $report, $max_wns, $tns_value, $vios"
}
But this does not work out. The loop just puts out the variable's names (because of "puts $report") but not its content.
If I do it without a loop (so for each variable the same code consecutively), I get the lists I want.
So how can I process these variables as a whole in a loop?
The problem lies in the below loop variable i, it is overriding variable value of outer-loop. Try changing inner-loop variable to j.
for {set i 1} {$i < [llength $report_list]} {incr i} {
# get value of violation
set slack [lindex [split [lindex $report_list $i] " "] 3]
set tns_value [expr $tns_value + $slack]
if {$vios == 0} {set max_wns $slack}
incr vios
}
It's hard to write an answer for this since so much is unknown. To begin with, you should probably change to assignment by list and a foreach loop like this:
set reports [list $max_transition $max_capacitance]
foreach report $reports {
Since you don't really need to use a for loop here, it makes sense to simplify it. Please comment and I will iteratively improve the answer if I can.
How can I retrieve a Modelsim signal value in this form x y into tcl so I can process x and y individually?
Currently I have this line in tcl to trace a signal value
when {/currentstate/comp_occupy} {set comp [exa
{/currentstate/comp_occupy}]}
This signal is a 2D array in Modelsim which is shown like x y in the widget.
This snippet should trace that variable
trace variable comp w grid_monitor
proc grid_monitor {name arrayindex op} {
global comp flag_ttt cells
if {$flag_ttt == 1} {
puts $comp
puts [llength $comp]
}
}
What I get out of this proc is like this {x y} but I have no idea how I can separate x and y. First I thought that's a list but llength returns 1!
Any idea how I can go about doing this? Or rather, how can I turn it into a proper list?
Thanks
Since we established that the braces were literal braces, you can trim them out. Once done, you can then split to get a list:
proc grid_monitor {name arrayindex op} {
global comp flag_ttt cells
if {$flag_ttt == 1} {
set new_comp [split [string trim $comp "{}"]]
puts $new_comp
puts [llength $new_comp]
}
}
string trim will trim from $comp the characters contained within the quotes, that is { and }. split will then split the string on space to give a list.
And if you want to assign x and y to the above, you can use lindex or lassign (if you have Tcl8.5 or later):
proc grid_monitor {name arrayindex op} {
global comp flag_ttt cells
if {$flag_ttt == 1} {
set new_comp [split [string trim $comp "{}"]]
puts $new_comp
puts [llength $new_comp]
set x [lindex $new_comp 0]
set y [lindex $new_comp 1]
puts "x is $x and y is $y"
}
}
Or...
set new_comp [split [string trim $comp "{}"]]
puts $new_comp
puts [llength $new_comp]
lassign $new_comp x y
puts "x is $x and y is $y"
In Tcl 8.5 the syntax to convert a string containing a valid list is to use the new expand operator:
set comp {*}$comp
It isn't clear if current versions of Modelsim have upgraded beyond 8.4 in which you need to do the same with eval:
eval set comp $comp
This uses the interpreter to do what it does best and avoids manually massaging the string.
#!/bin/sh
# This is a trial program
puts "++++++++++++++++++++++++++++++++++++++++++++++++++"
set y "0.0.0.0"
set z [split $y "."]
puts "z=$z"
lreplace $z 0 5
puts "z $z"
set v [llength $z]
puts "length of array= $v"
puts "in the loop-------->\n"
puts " "
incr v -1
puts $v
for {set ml $v } { $ml >= 0} { puts "$ml =ml"} {
for { set nl [lindex $z $ml]} { $nl >=4} { puts "$nl=nl"} {
puts $nl
after 2000
lset z $ml $nl
incr $nl
}
after 2000
incr ml -1
}
I am not able to enter the second for loop, is this a formatting issue ?
gives me some weird error. I added the sleep just to check whats happening so ignore that.
In your code your inner loop is only evaluating if nl >=4.
nl will be initialized as 0 from [lindex $z $ml]
Since you are incrementing $nl, my guess is you should change this line:
for { set nl [lindex $z $ml]} { $nl >=4} { puts "$nl=nl"} {
to this instead:
for { set nl [lindex $z $ml]} { $nl <=4} { puts "$nl=nl"} {
Was it perchance something like this you intended?
# This is a trial program
puts "++++++++++++++++++++++++++++++++++++++++++++++++++"
set y "0.0.0.0"
set z [split $y "."]
puts "\$z=$z"
set v [llength $z]
# the term 'array' means associative array in Tcl, better use 'list'
puts "length of list= $v"
puts "in the loop-------->\n\n"
incr v -1
puts "\$v=$v"
for {set ml $v} {$ml >= 0} {incr ml -1} {
for {set nl [lindex $z $ml]} {$nl <= 4} {incr nl} {
lset z $ml $nl
puts $z
}
}
Note that I've moved the incr command invocations to the third argument (the next command string, as the documentation puts it) of the for command invocations. You can put anything you want to run at the end of each iteration there, including puts commands as you did, but it's a convention and good practice to have the loop-control-changing commands (whatever they may be) there, and not much else.
Here is an excerpt from a data file, my starting point:
Marker Distance_1 Distance_2 ID
.
.
.
30 13387412 34.80391242 seq-SN_FIRST
31 13387444 34.80391444 seq-SN_Second
31.1
31.2
32 13387555 39.80391 seq-SN_Third
.
.
.
This is a tab-delimited file of multiple rows of four elements each. First row is a header. After that, numerous rows of data. The vertical dots are not actually in the real file, but they are here just to represent that data similar to the actual rows shown occur before and after the example fo rows explicitly shown.
Some of the data rows are "full", that is, all four cell entries contain something. Other rows are "blank", with only a first actual entry but followed by 3 tab delimited single spaces. Those white spaces in the blank rows need to be "filled in." The filling in will be done by linear interpolation, using the corresponding cell entries of the immediately preceding and immediately succeeding rows. For example, missing Distance_1 values, in column 2, will be interpolated using the value 13387444 of the preceding row and the value 13387555 of the succeeding row. Similarly so for the column 3 values. Column 4 values are just ignored here.
The first goal of the script is to identify the blocks of data that require filling and their flanking "full" lines. Blank lines will contain 3 tabbed single spaces and will be ID'd that way. Once found, the consecutive sets of blank lines plus flanking full lines are sent to subroutine for interpolation.
#!/usr/bin/perl
use strict;
use warnings;
die "usage: [ map positions file post SAS ]\n\n" unless #ARGV == 1;
my #file = ();
while (my $line = <$ARGV[0]>) {
chomp $line;
push(#file, $line);
}
my #consecutive_lines = (); # array collects a current set of consecutive lines requiring linear interpolation
my #filled = (); # my final array, hopefully correctly filled in
#####
# search for consecutive set of lines in #file
#####
for (my $i = 0; $i < $#file; $i++) { # $#file returns the index of the last element in #file
if ($file[$i] !~ /(?:\t\s){3}/) { # if not a blank line
# but a "full line"
push(#filled, $file[$i]); # push the header and full lines, until...
}
elsif ($file[$i] =~ /(?:\t\s){3}/) { # ...a blank line is found
push (#consecutive_lines, $file[$i - 1]); # push preceding full line into array
while ($file[$i] =~ /(?:\t\s){3}/ and $i < $#file) { # keep pushing lines, so long as they are blank
# or end of file
push(#consecutive_lines, $file[$i++]);
}
push(#consecutive_lines, $file[$i]) ; # else we reach next full line, so push it into array
my #fillme = linearInterpolation(#consecutive_lines); # send set of lines out for filling
push(#filled, #fillme); # push filled in set of lines into the final array
#consecutive_lines = (); # reset or undef array #consecutive_lines for next cycle
} # end of elsif
} # end of for loop
Thanks to user #Kenosis for lots help with the above, which I have since modified (hopefully not mangled).
Next is the linear interpolation. It is here where I am trying to link the first phase of the script to the second phase. And it is not working well so far.
My goal is to hand off the array #incoming to the subroutine. This array is then split up, so that the actual cell entries are "visible" and can be indexed by the array, and so called upon. I have been trying to figure out how to do this for the column 2 values for Distance_1 first. I feel like this script gets close and it begins to stray at the point just after the interpolated values are calculated.
#####
# subroutine linear interpolation
#####
sub linearInterpolation {
my #incoming = #_; # array of consecutive set of lines
my #splitup; # declare new array, will be a "split up" version of #incoming
my ($A, $B, $C, $D, $E); # variables for linear interpolation
my #fillme; # declaring the "emtpy" array to be filled in
my #dist_1_fills; # array of interpolated values for dist_1
for (my $i = 0;
$i < scalar #incoming; $i++) # loop to split up lines of #incoming
{ # into indexed cell entries
chomp $incoming[$i]; # and make new array of them
my #entries = split('\t', $incoming[$i]);
push(#splitup, #entries);
}
$A = $splitup[1]; # cell entry in column 2 of preceding full line
$B = $splitup[-3]; # cell entry in column 2 of succeeding full line
$C = $splitup[2]; # cell entry in column 3 of preceding full line
$D = $splitup[-2]; # cell entry in column 3 of succeeding full line
$E = scalar #incoming - 1; # equals number of lines in the set minus 1
for (my $i = 1; $i < $E; $i++) { # need to start finding appropriate
# number interpolated values, given number of
my #dist_1_fills =
interpvalues($A, $B, $E, $i); # of lines in consecutive set of lines
for ($i = 0; $i < scalar #splitup; $i += 4) {
push(#fillme, $splitup[$i], $dist_1_fills[$i], "dist_2_fills", "--");
# fourth column values will be ignored or filled with --.
# "dist_2_fills" just occupying it's proper spot until I can figure out distance 1 fills
}
}
}
#########
sub interpvalues { # subroutine to find interpolated values
my ($A, $B, $E, $i) = #_;
my $dist_1_answers = (($B - $A) / ($E)) * $i + $A;
return $dist_1_answers;
}
The code gets confused in the second part that deals with finding the interpolated values and sending them back to the first part of the code to finally fill in the data set. I think specifically my biggest (though probably not my only) problem is trying to fill in the blank lines with the proper values after they have been calculated in the second subroutine.
Any hints and clues are greatly appreciated!
This program will do what you need. It expects the inout file name as a parameter on the command line.
use strict;
use warnings;
my #saved;
my #needed;
while (<>) {
chomp;
my #fields = split /\t/;
# Pass hrough headers and junk
unless ($fields[0] and $fields[0] =~ /\d/) {
print "$_\n";
next;
}
# Save x-value for records without a y-value
if ($fields[1] !~ /\d/) {
push #needed, $fields[0];
next;
}
# We have a filled-out row. Calculate any intermediate missing ones
if (#needed) {
if ($saved[0] == $fields[0]) {
die sprintf qq(Duplicate marker values %.1f at line %d of "%s"\n), $saved[0], $., $ARGV;
}
my ($a1, $b1) = solve_linear(#saved[0,1], #fields[0,1]);
my ($a2, $b2) = solve_linear(#saved[0,2], #fields[0,2]);
while (#needed) {
my $x = shift #needed;
my $y1 = $a1 * $x + $b1;
my $y2 = $a2 * $x + $b2;
print join("\t", $x, $y1, $y2), "\n";
}
}
print "$_\n";
#saved = #fields;
}
sub solve_linear {
my ($x0, $y0, $x1, $y1) = #_;
my ($dx, $dy) = ($x1 - $x0, $y1 - $y0);
my $aa = $dy / $dx;
my $bb = ($y0 * $dx - $x0 * $dy) / $dx;
return ($aa, $bb);
}
output
Marker Distance_1 Distance_2 ID
.
.
.
30 13387412 34.80391242 seq-SN_FIRST
31 13387444 34.80391444 seq-SN_Second
31.1 13387455.1 35.303913996 --
31.2 13387466.2 35.803913552 --
32 13387555 39.80391 seq-SN_Third
.
.
.
Tool completed successfully
I modified the code to this, so that the linear interpolation is based not on the values in the first column but rather on the values in the second and third columns. Thanks especially to users #Kenosis and #Borodin. I have accepted Kenosis' answer to a previous question, and I have accepted Borodin's here, even though I post this revision in the "answer your own question" section. Is posting a revision here acceptable? I skimmed the FAQ about this but haven't found anything relevant yet.
#!/usr/bin/perl
use strict; use warnings;
my #saved;
my #needed;
while (<>) {
chomp;
my #fields = split /\t/;
# Does the current line actually exist AND does it contain one or more digits.
unless ($fields[0] and $fields[0] =~ /\d/) {
# If no, this is the header, so print it. If yes, advance.
print "$_\n";
#after printing header, go back to <> and read in next line.
next;
}
# Is the second cell of the current line devoid of digits?
if ($fields[1] !~ /\d/) {
# If no, advance. If yes, remember $field[0], the Marker.
push #needed, $fields[0];
# After pushing, go back to <> and read in next line.
next;
}
# If we are here, we must have a filled-out row.
# Does #needed have any values? If no, advance. If yes,
if (#needed) {
if ($saved[0] == $fields[0]) {
die sprintf qq(Duplicate marker values %.1f at line %d of "%s"\n), $saved[0], $., $ARGV;
}
# Else send preceding dist_1 value, succeeding dist_1 value,
# preceding dist_2 value, succeeding dist_2 value,
# and number of emtpy lines to subroutine.
my ($dist_1_interval, $dist_2_interval) = interval_sizes($saved[1], $fields[1], $saved[2], $fields[2], scalar #needed);
# Current size of #needed is saved as $size and is used to help with iteration.
# So long as #needed contains values...
my $size = scalar #needed;
while (#needed) {
# ...remove left-most Marker value from array #needed.
my $x = shift #needed;
# Interpolated values for dist_1 and dist_2 are
# (respective interval size x iteration of while loop) + preceding values.
my $new_dist_1 = ($dist_1_interval * (1 + ($size - (scalar #needed + 1)))) + $saved[1];
my $new_dist_2 = ($dist_2_interval * (1 + ($size - (scalar #needed + 1)))) + $saved[2];
print join("\t", $x, $new_dist_1, $new_dist_2, "--"), "\n";
}
}
# We are here since current line is already a filled-in row.
print "$_\n";
# Print this row and assign it to #saved. Return to <>.
#saved = #fields;
}
sub interval_sizes {
# $A = preceding dist_1, $B = succeeding dist_1,
# $C = preceding dist_2, $D = succeeding dist_2,
# $E = number of needed distances.
my ($A, $B, $C, $D, $E) = #_;
# I need an interval size for dist_1 based on difference between $B and $A.
my $dist_1_interval = ($B - $A)/($E + 1);
# I need an interval size for dist_2 based on difference between $D and $C.
my $dist_2_interval = ($D - $C)/($E + 1);
return ($dist_1_interval, $dist_2_interval);
}