characters from a stream - file

this question was asked in an interview..
assume your computer is reading characters one by one from a stream (you don't know the length of the stream before ending). Note that you have only one character of storage space (so you cann't save the characters you've read to a something like a strong). When you've finished reading you should return a character out of the stream with equal probability.
how to approach this problem?? any idea??
any way to slove this??

It's one of those tricks that you either know or don't:
Take the first character, with probability 1/2 take the next one, otherwise keep the first one, with probability 1/3 take the next one, otherwise keep, etc.
It works because every time you pick the n th char with probability of 1/n, or keep the previous one (that had probability 1/(n-1) to be there) with probability (1-n)/n, and the 1-n s cancel.

Related

Is it possible to count the frequency of a word in a file precisely using two buffers in C?

I have a file of size 1GB. I want to find out how many times the word "sosowhat" is found in the file. I've written a code using fgetc() which reads one character at a time from the file which is way too slower when it comes for a file of size 1GB. So I made a buffer of size 1000(using mmalloc) to hold 1000 words at a time from the file and I used the strstr() function to count the occurrence of the word "sosowhat". The logic is fine. But the problem is that if the part "so" of "sosowhat" is located at the end of the buffer and the "sowhat" part in the new buffer, the word will not be counted. So I used two buffers old_buffer and current_buffer. At the beginning of each buffer I want to check from the last few characters of old buffer. Is this possible? How can I go back to the old buffer? Is it possible without memmove()? As a beginner, I will be more than happy for your help.
Yes, it can be done. There are more possible approaches to this.
The first one, which is the cleanest, is to keep a second buffer, as suggested, of the length of the searched word, where you keep the last chunk of the old buffer. (It needs to be exactly the length of the searched word because you store wordLength - 1 characters + NULL terminator). Then the quickest way is to append to this stored chunk from the old buffer the first wordLen - 1 characters from the new buffer and search your word here. Then continue with your search normally. - Of course you can create a buffer which can hold both chunks (the last bytes from the old buffer and the first bytes from the new one).
Another approach (which I don't recommend, but can turn out to be a bit easier in terms of code) would be to fseek wordLen - 1 bytes backwards in the read file. This will "move" the chunk stored in previous approach to the next buffer. This is a bit dirtier as you will read some of the contents of the file twice. Although that's not something noticeable in terms of performance, I again recommend against it and use something like the first described approach.
use the same algorithm as per fgetc only read from the buffers you created. It will be same efficient as strstr iterates thorough the string char by char as well.

How to dynamically change the string from the i/o stream in c

I was looking at a problem in K&R (Exercise 1-18), which asked to remove any trailing blanks or tabs. That pushed me to think about text messengers like Whatsapp. The thing is lets say I am writing a word Parochial, then the moment I had just written paro, it shows parochial as options, I click on that replaces the entire word (even if the spelling is wrong written by me, it replaces when I chose an option).
What I am thinking is the pointer goes back to the starting of the word or say that with start of every new word when I am writing, the pointer gets fixed to the 1st letter & if I choose some option it replaces that entire word in the stream (don't know if I'm thinking in the right direction).
I can use getchar() to point at the next letter but how do I:
1: Go backward from the current position of the pointer pointing the stream?
(By using fseek())?
2: How to fix a pointer a position in an I/o stream, so that I can fix it at the beginning of a new word.
Please tell me my approach is correct or understanding of some different concept is needed. Thanks in advance
Standard streams are mainly for going forward*, minimizing the number of IO system calls, and for avoiding the need to keep large files in memory at once.
A GUI app is likely to want to keep all of its display output in memory, and when you have the whole thing in memory, going back and forth is just a simple mater of incrementing and decrementing pointers or indices.
*(random seeks aren't always optimal and they limit you from doing IO on nonseekable files such as pipes or sockets)

Base conversion from any base to any base in C (up to 36)

im looking for a base conversion function in c that could do conversions from bases 2 up to 36, including bases with characters A-Z.
For now i just found on the web functions that deal with base 2, ten and hex and a bit limited.
For this project, it would probably help to understand how bases work. In any case, let's walk through a process for how one might convert to, say, base twelve. This should be the simplest method to implement.
First up, we have our decimal number, since that's an easy place to start. Let's say, I dunno, 1452 is our number. We'll also need an array of characters for what each character is, since that'll be a lot easier than a straight ASCII conversion, where the number characters and letter characters are separated.
int dec=1452;
int toBase=12;
char outputs[36]={'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'}
Following that, we probably will only be OUTPUTTING the result in another base - it doesn't make sense to store it multiple ways, and makes your conversion process simpler by only converting from one base to any other given. We could store the result in a character array, but again, we already have the number stored - no point.
For this method I'm going to describe, we'll need a buffer variable to keep track of our number as we convert parts of it.
int buf=dec;
Next up, we'll start counting spaces back in the base we're going to, 12, and see what each space is worth. We'll continue until we pass our number, then backtrack one. We'll also need to save what space we're on for a for loop from that to the first space later.
int space=0;
while(Math.pow(toBase,space))<buf){
space++;
}//Braces added for clarity
space--;
Now, this is the main calculation loop, where we'll output the result. Again, the original number is still stored in 'dec,' so we don't need to worry about loss of data or changing it at all.
int i;
for(i=space;i>=0;i--){//We have set up the for loop to check each space as we progress
int modResult=buf%Math.pow(toBase,i);//Gets the number that goes in this space of the resulting base number
buf-=modResult*Math.pow(toBase,i);//We have that, so take it out of the number
printf("%c",outputs[modResult]);
}
Because of the way we're doing this, going from the top space to the bottom, modResult will never be higher than the highest number our base can go in. With this, your program will output to console the resulting number. Also, keep in mind that this only outputs the number - for the purposes of storage and calculation, it's much simpler to use the built-in functions that use base 10. Furthermore, be careful that your toBase variable never goes above 36.
As a further note, I numbered the digits (spaces), from right to left, starting at zero, because the far right space is 1, represented by your base to the zeroth power. Hope this helps.

how to read last n lines from a file in C

Its a microsoft interview question.
Read last n lines of file using C (precisely)
Well there could be so many ways to achieve this , few of them could be :
-> Simplest of all, in first pass , count the number of lines in the file and in second pass display the last n lines.
-> Or may be maintain a doubly linked-list for every line and display the last n lines by back traversing the linkedlist till nth last node.
-> Implement something of sort tail -n fname
-> In order to optimize it more we can have double pointer with length as n and every line stored dynamically in a round robin fashion till we reach the end of file.
for example if there are 10 lines in file and want to read last 3 lines. then we could create a array of buffer as buf[3][] and at run time would keep on mallocing and freeing the buffer in circular way till we reach the last line and keep a counter to know the current index of array.
Can anyone please help me with more optimized solution or atleast guide me if any of the above approaches can help me get the correct answer or any other popular approach/method for such kind of questions.
You can use a queue and to store the last n lines seen in this queue. When you see the eof just print the queue.
Another way is reading a blocks of 1024 bytes from the end of file towards the beginning. Stop when you find n \n characters and print out the last n lines.
You can have two file pointers initially pointing to beginning of file.
Keep on incrementing first pointer till it find '\n' character also stores the instance of file pointer when it find '\n'.
Once it find (n+1)th '\n',assign first stored instance of file pointer which we previously saved,to second file pointer.Keep on doing the same till EOF.
So when first file pointer is on EOF,second will be on n '\n' back.Then print all characters from second file pointer to EOF.
So this is solution which can print last n lines in file in single pass.
How about using memory mapped file and scan the file from backward? This eliminates the hard work of updating the buffer window each time every time if the lines happened to be longer than your buffer space. Then, when you found a \n, push the position into a stack. This works in O(L) where L is the number of characters to output. So there is nothing really better than that is it?

Trimming a char array in C

I am working on an assignment and I have been noticing a problem in my coding assignment. It is not clear to me how to tackle this problem, probably due to a lack of sleep but anyway. I need to trim a char array of it's white spaces for this assignment.
The solution I thought of involved a second char array and just simply copy the non white spaces to that array and I'm done. But how can I create a char array without knowing it's size, because at that moment I do not yet know the size. I still need to trim it in order to know how many characters need to be copied to the new array, which varies in the assignment
I know there are a lot of good questions out here on stackoverflow but I think this has more to do with the thought process rather then the correct syntax.
My second problem is how do I perform a fscanf/fgetc on a char array since it needs a stream, is it sufficient to give it a pointer rather then a stream?
If making the change in-place simply, shift every chracter after a space back, and repeat till the end of the array. This is very inefficient.
If making a new copy, make a new array of the same length, and then do as you were doing (copy all the non-space characters). If you copy the \0 character as well, then there will be no string termination issue. This is much more efficient.
Going by your comments, it appears you may have the option to input the array in any form you wish. I would then recommend that instead of doing text manipulations later on, just input the string in the form you need.
You can simply use scanf or fscanf repeatedly, to input the separate words into the same array. This will take care of all the whitespaces.
Here is one partial idea: You can make a first pass on the char array and count the blanks, then take the string length minus the blanks for the second array, then perform your copy across skipping the blanks.
You could also create a pass through the array:
Test until end of array:
Is my (Current/Index) position blank? (A space)
If so, grab next available non-blank value and put it there.
then index++
If not, index++
Not sure on the second, will do some checking and see if I can find a good answer there too.

Resources