how to read last n lines from a file in C - c

Its a microsoft interview question.
Read last n lines of file using C (precisely)
Well there could be so many ways to achieve this , few of them could be :
-> Simplest of all, in first pass , count the number of lines in the file and in second pass display the last n lines.
-> Or may be maintain a doubly linked-list for every line and display the last n lines by back traversing the linkedlist till nth last node.
-> Implement something of sort tail -n fname
-> In order to optimize it more we can have double pointer with length as n and every line stored dynamically in a round robin fashion till we reach the end of file.
for example if there are 10 lines in file and want to read last 3 lines. then we could create a array of buffer as buf[3][] and at run time would keep on mallocing and freeing the buffer in circular way till we reach the last line and keep a counter to know the current index of array.
Can anyone please help me with more optimized solution or atleast guide me if any of the above approaches can help me get the correct answer or any other popular approach/method for such kind of questions.

You can use a queue and to store the last n lines seen in this queue. When you see the eof just print the queue.
Another way is reading a blocks of 1024 bytes from the end of file towards the beginning. Stop when you find n \n characters and print out the last n lines.

You can have two file pointers initially pointing to beginning of file.
Keep on incrementing first pointer till it find '\n' character also stores the instance of file pointer when it find '\n'.
Once it find (n+1)th '\n',assign first stored instance of file pointer which we previously saved,to second file pointer.Keep on doing the same till EOF.
So when first file pointer is on EOF,second will be on n '\n' back.Then print all characters from second file pointer to EOF.
So this is solution which can print last n lines in file in single pass.

How about using memory mapped file and scan the file from backward? This eliminates the hard work of updating the buffer window each time every time if the lines happened to be longer than your buffer space. Then, when you found a \n, push the position into a stack. This works in O(L) where L is the number of characters to output. So there is nothing really better than that is it?

Related

Is it possible to count the frequency of a word in a file precisely using two buffers in C?

I have a file of size 1GB. I want to find out how many times the word "sosowhat" is found in the file. I've written a code using fgetc() which reads one character at a time from the file which is way too slower when it comes for a file of size 1GB. So I made a buffer of size 1000(using mmalloc) to hold 1000 words at a time from the file and I used the strstr() function to count the occurrence of the word "sosowhat". The logic is fine. But the problem is that if the part "so" of "sosowhat" is located at the end of the buffer and the "sowhat" part in the new buffer, the word will not be counted. So I used two buffers old_buffer and current_buffer. At the beginning of each buffer I want to check from the last few characters of old buffer. Is this possible? How can I go back to the old buffer? Is it possible without memmove()? As a beginner, I will be more than happy for your help.
Yes, it can be done. There are more possible approaches to this.
The first one, which is the cleanest, is to keep a second buffer, as suggested, of the length of the searched word, where you keep the last chunk of the old buffer. (It needs to be exactly the length of the searched word because you store wordLength - 1 characters + NULL terminator). Then the quickest way is to append to this stored chunk from the old buffer the first wordLen - 1 characters from the new buffer and search your word here. Then continue with your search normally. - Of course you can create a buffer which can hold both chunks (the last bytes from the old buffer and the first bytes from the new one).
Another approach (which I don't recommend, but can turn out to be a bit easier in terms of code) would be to fseek wordLen - 1 bytes backwards in the read file. This will "move" the chunk stored in previous approach to the next buffer. This is a bit dirtier as you will read some of the contents of the file twice. Although that's not something noticeable in terms of performance, I again recommend against it and use something like the first described approach.
use the same algorithm as per fgetc only read from the buffers you created. It will be same efficient as strstr iterates thorough the string char by char as well.

reading file contents in C, keeping track of new lines

I am reading strings in from a data file in C using fscanf() using a loop that goes until EOF. The data file contains strings separated by any white space. As the strings are read in with fscanf() they will be checked for any invalid information. I need to keep track of the line numbers that an item is on so that if an invalid item is detected the line number of that item will be displayed. Like I said though, the data file is not just one string per line. Is there anything more elegant than reading in everything up to a new line, splitting that into individual strings, and checking those strings? Can I determine the line number where the file pointer is currently pointing?
There is no line number counter in the C I/O libraries. You need to keep track of the line number yourself.
I suggest that you read one line at a time using fgets, incrementing your line number counter each time, and then read items from that string using sscanf.

How to create invisible space until a new line?

I am a beginner of C language. I was recently writing a program to print a histogram of the number of instances of a character in an input. Printing the histogram horizontally is easy, but vertically is a challenge.
Please have a look at the following code:
#include <stdio.h>
int main()
{
/*THIS IS JUST A SAMPLE OF WHAT I WANT TO ASK*/
int occurrence = 5;
int i;
for (i=0;i<occurrence;i++){
printf("\t*\n");
}
}
For an example, say any letter occurs 5 times. So I have set the occurrence to 5. And I am printing the bar in the form of asterisks. Now through this code, I am able to print an histogram containing 5 asterisks. But the thing is if I want to print other elements, like the x and y axis, the code creates a \n character. So if I write the code to print other elements, they start printing from the next line. So I figured out something else.
Now read this code:
#include <stdio.h>
int main()
{
/*THIS IS JUST A SAMPLE OF WHAT I WANT TO ASK*/
int instances = 5;
int i;
for (i=0;i<instances;i++){
printf("\t*\t\t\t\t\t\t\t\t\t");
}
}
Now what I did is depending on the size of the output screen, I created 9 tab characters so that the next asterisk moves on to the next line without printing any \n character.
Now the main question: IS THERE ANY OTHER WAY TO CREATE INVISIBLE SPACES UNTIL THE NEXT THING TO BE PRINTED MOVES ON TO THE NEXT LINE?
This question might be stupid but if there is a solution, it will be best for me.
NOTE: If there is no such method of creating blank spaces then please suggest a good way to create a vertical histogram. If someone wants an improvisation in the question, I am ready to do it.
Thanks for the help!
Outputs::
If I use the first code and I make other chart elements using printf statements, this is the output::
Now can you see that the bar made of asterisks is not aligned with the x and y axis. This happens due to the \n character.
Instead of hacking around with spaces yourself, you might want to look into a library that handles all that graphical stuff for you. For example, ncurses is a pretty decent library to do pseudo-graphical output on a console. However, "ncurses" seems to be for Unix only, but there may be other libraries for Windows.
If using a library is not an option, I'd suggest to work with a 2-dimensional character buffer (that you treat like a bitmap) instead of writing things directly to the console. It's a lot less "fiddling around". Just watch out to truncate your buffer line size to the console line size before printing, in order to avoid automatic line breaks where you don't want them.
If you do not want to use curses library, for example if you have found a printing terminal in a museum and want that your program can work on it, you have to reverse the problem.
You must print line by line if you do not use a graphic library. So your program could look like :
compute the occurence of characters
compute the maximum occurence
for each line from max occurence to 0
compute a line for every character printing a space (not reached)
for each character
if the occurence is greater than the line index put a mark at correct place in the line
print the line
That's the algo, actual coding is left as exercice for the reader :-)
You might want to think about having a two dimensional array. Start by filling it with spaces. Then replace the spaces with the character you want to print at the correct index. Using two for loops, traverse the array in order to print. Print a new line at the end of the row. Changing the order of the index changes a vertical/horizontal print. Negatives are
having to keep the whole thing in memory.
Needing multiple passes. One to set the characters and another to print.

Searching for a combination of characters in a file

I am trying to create a program that reads a file and searches for a specific combination of characters.
For example: "/start/ 4jy42jygsfsf /end/".
So I want to find all the "strings" starting with /start/ and ending with /end/.
In order to do that, I use read() function because the file might be a binary file (it doesn't have to be a file with chars).
I call the read() function like that:
#define BUFFSIZE 4000
// more declarations
while (read(file_descriptor, buffer, BUFFSIZE) > 0)
{
//search for /start/
//then search for /end/
//build a string with all the chars between these two
//keep searching till you reach the end of buffer
}
Assume that every /start/ is followed by an /end/.
The question is:
How do I deal with cases that this combination of characters is cut in half?
For example, let's say that the first time read() gets called, in the end of this buffer I spot /star and the next time read() gets called at the start of the second buffer there is t/ 4jy42jygsfsf /end/.
This combination might get cut anywhere. The solutions I thought will result to many many lines of code. Is there any smart way to deal with all these cases?
When you reach the end of the buffer, record the state of the current partial match, if any. Then when you get the next buffer, you have 4 general cases:
Not inside any text to be matched.
Saw just a beginning / at the end of the last buffer
Currently inside /start/. Another variable records how far you have matched.
Currently inside /end/. Same variable as for /start/ records how far you have matched.
Your states inside the matcher are generally:
Currently not matching anything
Just saw a / - next looking for an 's' or an 'e'.
Matching either start/ or end/.
Matched - either /start or /end.
Based on the partial match, just jump to the right state in the matcher.
OR
You can use the PCRE library. It supports partial matching. But probably is overkill for your purposes.

Remove spaces from a string, but not at the beginning or end

I am trying to remove spaces from a string in C, not from the end, nor the beginning, just multiple spaces in a string
For example
hello everyone this is a test
has two spaces between hello and everyone, and five spaces from this to is. Ultimately I would want to remove 1 space from the 2 and 4 from the 5, so every gap has 1 space exactly. Make sense?
This is what I was going to do:
create a pointer, point it to the string at element 1 char[0].
do a for loop through the length of the string
then my logic is, if my pointer at [i] is a space and my pointer at element [i+1] space then to do something
I am not quite sure what would be a good solution from here, bearing in mind I won't be using any pre-built functions. Does anyone have any ideas?
One way is to do it in-place. Loop through the string from the beginning to end. store a write pointer and a read pointer. Each loop the write pointer and read pointer advances by one. When you encounter a space transfer it as normal but then loop the read pointer incrementing each time until a non-space is found (Or the end of the string, obviously). Don't forget to add a '\0' at the end and you now have the same string without the spaces.
Are you allowed to use extra memory to create a duplicate of the string or you need to do the processing in place?
The easiest will be to allocate memory equally to the size of the original string and copy all characters there. If you meet an extra space, do not copy it.
If you need to do it in place, then create two pointers. One pointing to the character being read and one to the character being copied. When you meet an extra space, then adapt the 'read' pointer to point to the next non space character. Copy to the write position the character pointed by the read character. Then advance the read pointer to the character after the character being copied. The write pointer is incremented by one, whenever a copy is performed.
Example:
write
V
xxxx_xxxx__xxx
^
Read
A hard part here is that you can not remove an element from the array of characters easily. You could of course make a function that returns a char[] that has one particular element removed. Another option is to make an extra array that indicates which characters you should keep and afterward go over the char[] one more time only copying the characters you want to keep.
This is based on what Goz said, but I think he had finger trouble, because I'm pretty sure what he described would strip out all spaces (not just the second onwards of each run).
EDIT - oops - wrong about Goz, though the "extra one" wording would only cover runs of two spaces correctly.
EDIT - oops - pre-written solution removed...
The general idea, though, is to use the "from" and "to" pointers as others did, but also to preserve some information (state) from one iteration to the next so that you can decide whether you're in a run of spaces already or not.
You could do a find and replace for "  " and " ", and keep doing it until no more matches are found. Innefficient, but logical.

Resources