What is the best way to append line(s) to a file?
Currently I am using the following script:
/**
* Append the `line` to the file given at the `path`.
*
* #param path
* The absolute or relative path to the file with
* extension
* #param line
* The line to append
* #param [max_lines=10000]
* The maximum number of lines to allow for a file
* to prevent an infinite loop
*/
void append(string path, string line, number max_lines){
number f = OpenFileForReadingAndWriting(path);
// go through file until the end is reached to set the
// internal pointer to this position
number line_counter = 0;
string file_content = "";
string file_line;
while(ReadFileLine(f, file_line) && line_counter < max_lines){
line_counter++;
// file_content += file_line;
}
// result("file content: \n" + file_content + "{EOF}");
// append the line
WriteFile(f, line + "\n");
CloseFile(f);
}
void append(string path, string line){
append(path, line, 10000);
}
string path = "path/to/file.txt";
append(path, "Appended line");
For me it seems a little bit odd to read the whole file content to just append one line. If the file is very big, this probably is very slow1. So I guess there is a better solution of this. Does anyone know this solution?
Some background
My application is written in python but executed in Digital Micrograph. My python application is logging its steps. Sometimes I am executing dm-script from python. There I have no possibility to see what is going on. Since there is a bug, I need something to find out what is going on. Therefore I want to add logging to dm-script too.
This also explains, why I want to open and close the file every single time. This takes more time, but I don't care about execution speed while debugging. The logs will either be removed or switched off for the normal version, as usual. But on the other hand I am executing dm-script and python alternating so I have to prevent python blocking the file for dm-script and the other way around.
1As written in the background, I am not really interested in speed. So the current script is enough for me. Still I am interested in how to do this better, just for learnings and curiositys sake.
The best way to deal with any files in DM-script (binary or text) is to use the streaming object. The following example should answer your question:
void writeText()
{
string path
if ( !SaveAsDialog( "Save text as" , path , path ) ) return
number fileID = CreateFileForWriting( path )
object fStream = NewStreamFromFileReference( fileID , 1 ) // 1 for auto-close file when out of scope
// Write some text
number encoding = 0 // 0 = system default
fStream.StreamWriteAsText( encoding , "The quick brown dog jumps over the lazy fox" )
// Replace last 'fox' by 'dog'
fStream.StreamSetPos( 1 , -3 ) // 3 bytes before current position
fStream.StreamWriteAsText( encoding, "dog" )
// Replace first 'dog' by 'fox'
fStream.StreamSetPos( 0 , 16 ) // 16 bytes after start
fStream.StreamWriteAsText( encoding, "fox" )
// Append at end
fStream.StreamSetPos( 2 , 0 ) // end position (0 bytes from end)
fStream.StreamWriteAsText( encoding, "." )
}
writeText()
Related
I have two huge json files (20GB each) and I need to join them. The files have the following content:
file_1.json = [{"key": "value"}, {...}]
file_2.json = [{"key": "value"}, {...}]
The main problem, however, is that I need all dict to be in the same list. I tried to do this in python, but unfortunately, I don't have the memory to do this operation.
So, I thought maybe I could tackle this with unix commands, by replacing, in the first file, the ] for , (note that there is a space after the comma) and erasing [ from the second file. Then, I would join the two files with the cat unix command.
Is there a way for me to edit only the last 10 char in unix?
I tried to use echo and tr but I might be doing something wrong with the syntax.
You can very easily append to a file in place, i.e. add characters at the end without rewriting the data that's already there. With the right tools (truncate if your system has it), you can truncate a file in place, i.e. remove characters at the end without rewriting the data that's staying. With the right tools (dd, if you're feeling adventurous), you can replace a part of a file by a string of the same length, without rewriting the unchanged parts. On the other hand, you can't remove characters from the beginning or middle of a file without rewriting the file (with a few exceptions that aren't relevant here).
But anyway rewriting both files in place wouldn't help you that much. You will need to at least rewrite the content of the second file to append it to the first file.
If you don't need to keep the split files around, you can append the second file to the first file in place, after taking care of the middle punctuation. Remove the last ] character from the first file, as well as any following spaces and line breaks. Assuming that the first file ends in ] and a newline and you have GNU core utilities (e.g. non-embedded Linux):
truncate -s -2 file_1.json
Now you can add a comma and optionally a line break to the first file, and append the data from the second file without its first character.
echo , >>file_1.json
tail -c +2 file_2.json >>file_1.json
If you want to keep the original files unmodified, you can make a copy of the first file and truncate it. Or you can directly make a truncated copy of the first file (still assuming GNU coreutils):
head -c -2 file_1.json >concatenated.json
echo , >>concatenated.json
tail -c +2 file_2.json >>concatenated.json
If you're more comfortable with Python, you can do all of this in Python. Just don't read the whole file in one go, i.e. don't call read() or use readline() in a way that reads all the lines as once. Instead, read and process a single line at a time (if the lines are short) or a single block of data. Untested code:
with open('concatenated.json', 'wb') as out:
with open('file_1.json', 'rb') as inp:
buf = bytes(1024)
size = inp.seek(-len(buf), io.SEEK_END)
n = inp.readinto(buf)
m = re.search(rb']\s*\Z', buf)
stop_at = m.start()
inp.seek(0, io.SEEK_SET)
n = inp.readinto(buf)
total = n
while n > 0:
out.write(buf)
n = inp.readinto(buf)
total += n
if total > stop_at:
out.write(buf[:len(buf)-(total-stop_at)])
n = 0
out.write(b',')
with open('file_2.json', 'rb') as inp:
buf = bytes(1024)
n = inp.readinto(buf)
assert buf[0] == b'['
buf[0:1] = b'\n'
while n > 0:
out.write(buf)
n = inp.readinto(buf)
I wanted to know whether there is a piece of code that can help me with making a new file with a different name every time the function with that code runs, for instance I want to write this
FILE * new_file = fopen("D:\C\data\1.txt", "w +" );
as
FILE * new_file = fopen("D:\C\data\%d.txt", next_number, "w +" );
that next_number comes from another procedure which always will get the number following a number stored in a different file.
You can use snprintf().
Example:
char file_name[100]; // assuming path length is at most 100
snprintf(file_name, sizeof(file_name), "D:\\C\\data\\%d.txt", next_number);
FILE * new_file = fopen(file_name, "w+" );
I need to count the number of vowels in a text file given to me (with python program) and return the number. For whatever reason when I run the program the file returns 0 vowels even though the count variable is supposed to increase by one each time it loops and finds a vowel.
def numVowels(file):
count = 0
opened_file = open(file)
content = opened_file.readlines()
for char in content:
if char.lower() in 'aeiou':
count += 1
return(count)
I'm not sure if that is because I am working with a text file, but usually I am able to do this without an issue. Any help is greatly appreciated.
Thank you!
readlines() returns a list of lines from the file so for char in content: means char is a line of text in the file which isn't what you are looking for.
You can read() the whole file into memory or iterate through the file line by line and then iterate through the line character at at time:
def numVowels(file):
count = 0
with open(file) as opened_file:
for content in opened_file:
for char in content:
if char.lower() in 'aeiou':
count += 1
return count
You can sum a generator of 1's to produce the same value:
def numVowels(file):
with open(file) as f:
return sum(1 for content in f for char in content if char.lower() in 'aeiou')
I'm trying to open this file (final.txt) and read the contents:
c0001
f260
L
D11
H30
R0000
C0040
1X1100000100010B300300003003
181100202900027Part No
181100202900097[PRTNUM]
1e5504002400030B
1X1100002300010L300003
191100202000030Quantity
191100202000080[QUANTY]
1e5504001500040B
1X1100001400010L300003
1X1100001400150L003090
191100202000170P.O.No
191100202000220[PONUMB]
1e5504001500180B
191100201200030Supplier
1e3304000700030B
1X1100000600010L300003
181100200300030Serial
181100200300090[SERIAL]
171100300900190Rev
171100300300190[REV]
171100300900240Units
171100300300240[UNITS]
1X1100000100180L003130
Q0001
E
from which I am reading only [PRTNUM], [QUANTY], [PONUMB], [SERIAL], [UNITS].
I've written the following C program:
char* cStart = strchr(cString, '[');
if (cStart)
{
// open bracket found
*cStart++ = '\0'; // split the string at [
char* cEnd = strchr(cStart, ']');
// you could check here for the close bracket being found
// and throw an exception if not
*cEnd = '\0'; // terminate the keyword
printf("Key: %s, Value: %s",cString, cStart);
}
// continue the loop
but now I want to replace these placeholders with data from the 2nd file:
132424235
004342
L1000
DZ12
234235
234235
I want to replace [PRTNUM] (from the 1st file) with 132424235 and so on... In the end my file should be updated with all this data. Can you tell me what function I should use in the above program?
If you don't mind having an alternate approach, here's an algorithm to do the work in an elegant way
Create one (large enough) temporary buffer. Also, create (open) one output file which will be the modified version.
Read a line from the input file into the buffer using fgets()
Search for the particular "keyword" using strstr()
If a match is found --
4.1. Open the other input file.
4.2. Read the corresponding data (line), using fgets()
4.3. Replace the actual data in the temporary buffer with the newly read value.
4.4. write the modified data to the output file.
If match is not found, write the original data in the output file. Then, go to step 2.
Continue until fgets() returns NULL (indicates the file content has been exhausted).
Finally, the output file will have the data from the first file with those particular "placeholders" substituted with the value read from the second file.
Obviously, you need to polish the algorithm a little bit to make it work with multiple "placeholder" string.
Keep an extra string(name it copy) large enough to hold file 1 + some extra to manage replacement of [PRTNUM] with 132424235.
Start reading first string that has file1 and keep copying into second string (copy) as soon as you encounter [PRTNUM] , in string 2 instead of copying [PRTNUM] you append it with 132424235 and so on for all others.
And finally replace file1.txt with this second (copy) string.
Is there any way to figure out the length of a .dat file (in terms of rows) without loading the file into the workspace?
Row Counter -- only loads one character per row:
Nrows = numel(textread('mydata.txt','%1c%*[^\n]'))
or file length (Matlab):
datfileh = fopen(fullfile(path, filename));
fseek(datfileh, 0,'eof');
filelength = ftell(datfileh);
fclose(datfileh);
I'm assuming you are working with text files, since you mentioned finding the number of rows.
Here's one solution:
fid = fopen('your_file.dat','rt');
nLines = 0;
while (fgets(fid) ~= -1),
nLines = nLines+1;
end
fclose(fid);
This uses FGETS to read each line, counting the number of lines it reads. Note that the data from the file is never saved to the workspace, it is simply used in the conditional check for the while loop.
It's also worth bearing in mind that you can use your file system's in-built commands, so on linux you could use the command
[s,w] = system('wc -l your_file.dat');
and then get the number of lines from the returned text (which is stored in w). (I don't think there's an equivalent command under Windows.)