Tcl strings to middle of binary file without overwriting its conten - file

I have a binary file
in which I am trying to add a string in the middle of the file
(lets say after 10 Bytes)
I succees to overwrite the file with my string - but not to append
appreciate if someone can tell how can I append the string.
Here is my code example:
proc write_bit_header {} {
set bit_hdr "#Here is my new string to be added#"
set bit_hdr_len [string length ${bit_hdr}]
set outBinData [binary format a${bit_hdr_len} ${bit_hdr}]
set fp [open "binfile" "a+b"]
fconfigure $fp -translation binary
seek $fp 10
puts -nonewline $fp $outBinData
close $fp
}

When you write to the middle of a file (which you'd use the mode r+b for), none of the other bytes in the file move around. They're still at exactly the same offsets within the file that they were beforehand. If you're writing a fixed-size binary record into the file, this is exactly what you want! However, if you're writing a variable sized record, you have to:
read all the data that is going to go after the bytes that you want to write
seek to the place where you want to do the insert/replace
write the data that you are inserting
write the data that you read in step 1
truncate the file (in case what you wrote in step 3 is shorter than what you were replacing).
Yes, this is non-trivial!
proc insertData {filename dataToInsert insertionPoint {firstAfterByte ""}} {
# If you don't give the end of the range to overwrite, it's zero-length
if {$firstAfterByte eq ""} {
set firstAfterByte $insertionPoint
}
set f [open $filename "r+b"]
chan seek $f $firstAfterByte
set suffixData [chan read $f]
chan seek $f $insertionPoint
chan puts -nonewline $f $dataToInsert
chan puts -nonewline $f $suffixData
chan truncate $f
close $f
}
It's much easier when you're appending, as you are not having to move around any existing data and never need to truncate. And you can use the ab mode so that you don't need to seek explicitly.
proc appendData {filename dataToAppend} {
set f [open $filename "ab"]
puts -nonewline $f $dataToAppend
close $f
}
As you can see, the insertion code is quite a lot more tricky. It runs quite a bit of a risk of going wrong too. It's better to use a working copy file, and then replace the original at the end:
proc insertDataSafely {filename dataToInsert insertionPoint {firstAfterByte ""}} {
set f_in [open $filename "rb"]
set f_out [open ${filename}.tmp "wb"]
try {
chan copy $f_in $f_out $insertionPoint
puts -nonewline $f_out $dataToInsert
if {$firstAfterByte ne ""} {
chan seek $f_in $firstAfterByte
}
chan copy $f_in $f_out
chan close $f_in
chan close $f_out
} on ok {} {
file rename ${filename}.tmp $filename
} on error {msg opt} {
file delete ${filename}.tmp
# Reraise the error
return -options $opt $msg
}
}
Of course, not all files take kindly to this sort of thing being done in the first place, but the ways in which modifying an arbitrary file can make things go haywire is long and thoroughly out of scope for this question.

Related

How to read file from end to start (in reverse order) in TCL?

I have a very large text file from which I have to extract some data. I read the file line by line and look for keywords. As I know that the keywords I am looking for are much closer to the end of the file than to the beginning.
I tried tac keyword
set fh [open "|tac filename"]
I am getting error as : couldn't execute "tac": no such file or directory
My file size is big so i am not able to store the line in a loop and reverse it again. Please suggest some solution
tac is itself a fairly simple program -- you could just implement its algorithm in Tcl, at least if you're determined to literally read each line in reverse order. However, I think that constraint is not really necessary -- you said that the content you're looking for is more likely to be near the end than near the beginning, not that you had to scan the lines in reverse order. That means you can do something a little bit simpler. Roughly speaking:
Seek to an offset near the end of the file.
Read line-by-line as normal, until you hit data you've already processed.
Seek to an offset a bit further back from the end of the file.
Read line-by-line as normal, until you hit data you've already processed.
etc.
This way you don't actually have to keep anything more in memory than the single line you're processing right now, and you'll process the data at the end of the file before data earlier in the file. Maybe you could eke out a tiny bit more performance by strictly processing the lines in reverse order but I doubt it will matter compared to the advantage you gain by not scanning from start to finish.
Here's some sample code that implements this algorithm. Note the bit of care taken to avoid processing a partial line:
set BLOCKSIZE 16384
set offset [file size $filename]
set lastOffset [file size $filename]
set f [open $filename r]
while { 1 } {
seek $f $offset
if { $offset > 0 } {
# We may have accidentally read a partial line, because we don't
# know where the line boundaries are. Skip to the end of whatever
# line we're in, and discard the content. We'll get it instead
# at the end of the _next_ block.
gets $f
set offset [tell $f]
}
while { [tell $f] < $lastOffset } {
set line [gets $f]
### Do whatever you're going to do with the line here
puts $line
}
set lastOffset $offset
if { $lastOffset == 0 } {
# All done, we just processed the start of the file.
break
}
set offset [expr {$offset - $BLOCKSIZE}]
if { $offset < 0 } {
set offset 0
}
}
close $f
The cost of reversing a file is actually fairly high. The best option I can think of is to construct a list of file offsets of the starts of lines, and then to use a seek;gets pattern to go over that list.
set f [open $filename]
# Construct the list of indices
set indices {}
while {![eof $f]} {
lappend indices [tell $f]
gets $f
}
# Iterate backwards
foreach idx [lreverse $indices] {
seek $f $idx
set line [gets $f]
DoStuffWithALine $line
}
close $f
The cost of this approach is non-trivial (even if you happened to have a cache of the indices, you'd still have issues with it) as it doesn't work well with how the OS pre-fetches disk data.

how to read a large file line by line using tcl?

I've written one piece of code by using a while loop but it will take too much time to read the file line by line. Can any one help me please?
my code :
set a [open myfile r]
while {[gets $a line]>=0} {
"do somethig by using the line variable"
}
The code looks fine. It's pretty quick (if you're using a sufficiently new version of Tcl; historically, there were some minor versions of Tcl that had buffer management problems) and is how you read a line at a time.
It's a little faster if you can read in larger amounts at once, but then you need to have enough memory to hold the file. To put that in context, files that are a few million lines are usually no problem; modern computers can handle that sort of thing just fine:
set a [open myfile]
set lines [split [read $a] "\n"]
close $a; # Saves a few bytes :-)
foreach line $lines {
# do something with each line...
}
If it truly is a large file you should do the following to read in only a line at a time. Using your method will read the entire contents into ram.
https://www.tcl.tk/man/tcl8.5/tutorial/Tcl24.html
#
# Count the number of lines in a text file
#
set infile [open "myfile.txt" r]
set number 0
#
# gets with two arguments returns the length of the line,
# -1 if the end of the file is found
#
while { [gets $infile line] >= 0 } {
incr number
}
close $infile
puts "Number of lines: $number"
#
# Also report it in an external file
#
set outfile [open "report.out" w]
puts $outfile "Number of lines: $number"
close $outfile

append a string in a file via tcl

i want to open up a pre-existed file and want to add a string inside the file one line before it sees the word 'exit' inside the file. the word 'exit' will always be the last line inside the file, so we can also see this as " add the string one line above the last line" problem. in other words, I want to append this string inside the file. here is example
Example.tcl (before)
AAAAAAA
BBBBBBB
CCCCCC
exit
Example.tcl (after)
AAAAAAA
BBBBBBB
CCCCCC
new_word_string
exit
Any suggestions are most welcome.
Working code:
Open the file for reading, and also open a temporary file:
set f1 [open $thefile]
set f2 [file tempfile]
Read one line at a time until all lines have been read. Look at the line. If it is the string "exit", print the new string to the temporary file. The write the line you read to the temporary file.
while {[set line [chan gets $f1]] ne {}} {
if {$line eq "exit"} {
chan puts $f2 $thestring
}
chan puts $f2 $line
}
Close the file and reopen it for reading.
chan close $f1
set f1 [open $thefile w]
Rewind the temporary file to the start position.
chan seek $f2 0
Read the entire contents of the temporary file and print them to the file.
chan puts -nonewline $f1 [chan read -nonewline $f2]
Close both files.
chan close $f1
chan close $f2
And we're done.
You could use a string buffer instead of a temporary file with minimal changes, to wit:
set f [open $thefile]
set tempstr {}
while {[set line [chan gets $f]] ne {}} {
if {$line eq "exit"} {
append tempstr $thestring\n
}
append tempstr $line\n
}
chan close $f
set f [open $thefile w]
chan puts -nonewline $f $tempstr
chan close $f
Documentation: append, chan, if, open, set, while
You could farm the work out to an external command (Tcl was written as a glue language after all):
% exec cat example.tcl
AAAAAAA
BBBBBBB
CCCCCC
exit
% set new_line "this is the new line inserted before exit"
this is the new line inserted before exit
% exec sed -i "\$i$new_line" example.tcl
% exec cat example.tcl
AAAAAAA
BBBBBBB
CCCCCC
this is the new line inserted before exit
exit

How to look for the difference between two large files in tcl?

I have two files, the some of the contents of these might be common in both. (say file A.txt and file B.txt)
Both the files are sorted files.
I need to get the difference of file A.txt and B.txt, ie, a file C.txt which has contents of A except the common contents in both.
I used the typical search and print algorithm, ie, took a line from A.txt, searched in B.txt, if found, print nothing in C.txt, else print that line in C.txt.
But, I am dealing with files with huge # of contents, and thus, it throws error: failed to load too many files. (Though it works fine for smaller files)
Can anybody suggest more efficient way of getting C.txt?
Script to be used: TCL only!
First off, the too many files error is an indication that you're not closing a channel, probably in the B.txt scanner. Fixing that is probably your first goal. If you've got Tcl 8.6, try this helper procedure:
proc scanForLine {searchLine filename} {
set f [open $filename]
try {
while {[gets $f line] >= 0} {
if {$line eq $searchLine} {
return true
}
}
return false
} finally {
close $f
}
}
However, if one of the files is small enough to fit into memory reasonably, you'd be far better reading it into a hash table (e.g., a dictionary or array):
set f [open B.txt]
while {[gets $f line]} {
set B($line) "any dummy value; we'll ignore it"
}
close $f
set in [open A.txt]
set out [open C.txt w]
while {[gets $in line]} {
if {![info exists B($line)]} {
puts $out $line
}
}
close $in
close $out
This is much more efficient, but depends on B.txt being small enough.
If both A.txt and B.txt are too large for that, you are probably best doing some sort of processing by stages, writing things out to disk in-between. This is getting rather more complex!
set filter [open B.txt]
set fromFile A.txt
for {set tmp 0} {![eof $filter]} {incr tmp} {
# Filter by a million lines at a time; that'll probably fit OK
for {set i 0} {$i < 1000000} {incr i} {
if {[gets $filter line] < 0} break
set B($line) "dummy"
}
# Do the filtering
if {$tmp} {set fromFile $toFile}
set from [open $fromFile]
set to [open [set toFile /tmp/[pid]_$tmp.txt] w]
while {[gets $from line] >= 0} {
if {![info exists B($line)]} {
puts $to $line
}
}
close $from
close $to
# Keep control of temporary files and data
if {$tmp} {file delete $fromFile}
unset B
}
close $filter
file rename $toFile C.txt
Warning! I've not tested this codeā€¦

in tcl, how do I replace a line in a file?

let's say I opened a file, then parsed it into lines. Then I use a loop:
foreach line $lines {}
inside the loop, for some lines, I want to replace them inside the file with different lines. Is it possible? Or do I have to write to another temporary file, then replace the files when I'm done?
e.g., if the file contained
AA
BB
and then I replace capital letters with lower case letters, I want the original file to contain
aa
bb
Thanks!
for plain text files, it's safest to move the original file to a "backup" name then rewrite it using the original filename:
Update: edited based on Donal's feedback
set timestamp [clock format [clock seconds] -format {%Y%m%d%H%M%S}]
set filename "filename.txt"
set temp $filename.new.$timestamp
set backup $filename.bak.$timestamp
set in [open $filename r]
set out [open $temp w]
# line-by-line, read the original file
while {[gets $in line] != -1} {
#transform $line somehow
set line [string tolower $line]
# then write the transformed line
puts $out $line
}
close $in
close $out
# move the new data to the proper filename
file link -hard $filename $backup
file rename -force $temp $filename
In addition to Glenn's answer. If you would like to operate on the file on a whole contents basis and the file is not too large, then you can use fileutil::updateInPlace. Here is a code sample:
package require fileutil
proc processContents {fileContents} {
# Search: AA, replace: aa
return [string map {AA aa} $fileContents]
}
fileutil::updateInPlace data.txt processContents
If this is Linux it'd be easier to exec "sed -i" and let it do the work for you.
If it's a short file you can just store it in a list:
set temp ""
#saves each line to an arg in a temp list
set file [open $loc]
foreach {i} [split [read $file] \n] {
lappend temp $i
}
close $file
#rewrites your file
set file [open $loc w+]
foreach {i} $temp {
#do something, for your example:
puts $file [string tolower $i]
}
close $file
set fileID [open "lineremove.txt" r]
set temp [open "temp.txt" w+]
while {[eof $fileID] != 1} {
gets $fileID lineInfo
regsub -all "delted information type here" $lineInfo "" lineInfo
puts $temp $lineInfo
}
file delete -force lineremove.txt
file rename -force temp.txt lineremove.txt
For the next poor soul that is looking for a SIMPLE tcl script to change all occurrences of one word to a new word, below script will read each line of myfile and change all red to blue then output the line to in a new file called mynewfile.
set fin "myfile"
set fout "mynewfile"
set win [open $fin r]
set wout [open $fout w]
while {[gets $win line] != -1} {
set line [regsub {(red)} $line blue]
puts $wout $line
}
close $win
close $wout

Resources