Efficiency reading chars of a file - c

I train myself in building a compiler. When I read the file I sometimes need to look a few characters ahead of my current position to know which token I have to generate.
There are two options that come to my mind in that case:
I read the entire file first and access the characters with an index variable
I read one char at the time with getc(file); and in case I have to go back to some previous character I use fseek(file, -1, SEEK_CUR);
Which one of these options is more efficient? Which would you prefer?

Thanks for the comments. My decision is to just read the file entirely first and then later check if I run into any performance issues.

Related

Is it possible to count the frequency of a word in a file precisely using two buffers in C?

I have a file of size 1GB. I want to find out how many times the word "sosowhat" is found in the file. I've written a code using fgetc() which reads one character at a time from the file which is way too slower when it comes for a file of size 1GB. So I made a buffer of size 1000(using mmalloc) to hold 1000 words at a time from the file and I used the strstr() function to count the occurrence of the word "sosowhat". The logic is fine. But the problem is that if the part "so" of "sosowhat" is located at the end of the buffer and the "sowhat" part in the new buffer, the word will not be counted. So I used two buffers old_buffer and current_buffer. At the beginning of each buffer I want to check from the last few characters of old buffer. Is this possible? How can I go back to the old buffer? Is it possible without memmove()? As a beginner, I will be more than happy for your help.
Yes, it can be done. There are more possible approaches to this.
The first one, which is the cleanest, is to keep a second buffer, as suggested, of the length of the searched word, where you keep the last chunk of the old buffer. (It needs to be exactly the length of the searched word because you store wordLength - 1 characters + NULL terminator). Then the quickest way is to append to this stored chunk from the old buffer the first wordLen - 1 characters from the new buffer and search your word here. Then continue with your search normally. - Of course you can create a buffer which can hold both chunks (the last bytes from the old buffer and the first bytes from the new one).
Another approach (which I don't recommend, but can turn out to be a bit easier in terms of code) would be to fseek wordLen - 1 bytes backwards in the read file. This will "move" the chunk stored in previous approach to the next buffer. This is a bit dirtier as you will read some of the contents of the file twice. Although that's not something noticeable in terms of performance, I again recommend against it and use something like the first described approach.
use the same algorithm as per fgetc only read from the buffers you created. It will be same efficient as strstr iterates thorough the string char by char as well.

How to dynamically change the string from the i/o stream in c

I was looking at a problem in K&R (Exercise 1-18), which asked to remove any trailing blanks or tabs. That pushed me to think about text messengers like Whatsapp. The thing is lets say I am writing a word Parochial, then the moment I had just written paro, it shows parochial as options, I click on that replaces the entire word (even if the spelling is wrong written by me, it replaces when I chose an option).
What I am thinking is the pointer goes back to the starting of the word or say that with start of every new word when I am writing, the pointer gets fixed to the 1st letter & if I choose some option it replaces that entire word in the stream (don't know if I'm thinking in the right direction).
I can use getchar() to point at the next letter but how do I:
1: Go backward from the current position of the pointer pointing the stream?
(By using fseek())?
2: How to fix a pointer a position in an I/o stream, so that I can fix it at the beginning of a new word.
Please tell me my approach is correct or understanding of some different concept is needed. Thanks in advance
Standard streams are mainly for going forward*, minimizing the number of IO system calls, and for avoiding the need to keep large files in memory at once.
A GUI app is likely to want to keep all of its display output in memory, and when you have the whole thing in memory, going back and forth is just a simple mater of incrementing and decrementing pointers or indices.
*(random seeks aren't always optimal and they limit you from doing IO on nonseekable files such as pipes or sockets)

Most efficient way to replace a line in a text document ?

I am learning to code in Unix with C. So far I have written the code to find the index of the first byte of the line that I want to replace. The problem is that sometimes, the number of bytes replacing the line might be greater than the number of bytes already on the line. In this case, the code start overwriting the next line. I came up with two standard solutions:
a) Rather than trying to edit the file in-place, I could copy the entire file into memory, edit it by shifting all the bytes if necessary and rewriting it back to file.
b) Only copy the line I want to end-of-file to memory and edit.
Both suggestions doesn't scale well. And I don't want to impose any restrictions on the line size(like every line must be 50 bytes or something). Is there any efficient way to do the line replacement ? Any help would be appreciated.
Copy the first part of the file to a new file (no need to read it all into memory). Then, write the new version of the line. Finally, copy the final part of the file. Swap files and done.

how we can set file pointer new position in terms of lines in c

As i think we have fseek function to set file pointer's new position measured in terms of bytes. How we can move file pointer new position in terms of lines?
The short answer: there's no easy way. A file in C is a bunch of bytes, and there is nothing in particular that makes the bytes '\n' and '\r' special (depending on your system). If you really care about a general solution, I would recommend building a lookup table for the byte offsets of line endings as you read the file, and then using it to jump around in the file later on.
Cant make pointer directly to the lines . Reads the file
The basic stdio functions operate on bytes only. You will have to read the file byte by byte and count the lines yourself.
I was facing the same problem. My solution was to store the seek positions of some of the lines and doing a forward search from there.
Eg. If you have a million lines, you can store seek positions of every thousandth line.

How to add one line before the last line in C

Hi I am working in C on Unix platform. Please tell me how to append one line before the last line in C. I have used fopen in appending mode but I cant add one line before the last line.
I just want to write to the second last line in the file.
You don't need to overwrite the whole file. You just have to:
open your file in "rw" mode,
read your file to find the last line: store its position (ftell/ftello) in the file and its contents
go back to the beginning of the last line (fseek/fseeko)
write whatever you want before the last line
write the last line.
close your file.
There is no way of doing this directly in standard C, mostly because few file systems support this operation. The easiest way round this is to read the file into an in memory structure (where you probably have it anyway), insert the line in memory, then write the whole structure out again, overwriting the original file.
Append only appends to the end, not in the middle.
You need to read in the entire file, and then write it out to a new file. You might have luck starting from the back, and finding the byte offset of the second-to-last linefeed. Then you can just block write the entire "prelude", add your new line, and then emit the remaining trailer.
You can find the place where the last line ends, read the last line into memory, seek back to the place, write the new line, and then the last line.
To find the place: Seek to the end, minus a buffer size. Read buffer, look for
newline. If not found, seek backwards two buffer sizes, and try again.
You'll need to use the r+ mode for fopen.
Oh, and you'll need to be careful about text and binary modes. You need to use binary mode, since with text mode you can't compute jump positions, you can only jump to locations you've gotten from ftell. You can work around that by reading through the entire file, and calling ftell at the beginning of each line. For large files, that is going to be slow.
Use fseek to jump to end of file, read backwards until you encounter a newline. Then insert your line.
You might want to save the 'last line' you are reading by counting how many chars you are reading backwards then strncpy it to a properly allocated buffer.

Resources