I want to print every character that is read by readline as it's read in.
Currently I'm able to obviously print out everything after it's fully read in.
This is for a shell that is being written.
readline() is a blocking function and wait for a complete line.
Just use read() in a loop. If there is no new character read() will return 0 and if a character is present it will return 1 (or n if n characters are in stdin since last call to read() and you provide a buffer big enough).
Generally the standard input has the file descriptor 0 reserved for it so your code may be like :
char buf[1];
int i;
while(1) {
i = read(0, buf, 1);
if (i>0)
{
// process character
}
}
Related
char buff[1];
int main() {
int c;
c = getchar();
printf("%d\n", c); //output -1
c = getchar();
printf("%d\n", c); // output -1
int res;
//here I get a prompt for input. What happened to EOF ?
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
return 0;
}
The resulting output showed with commented lines in the code is the result of simply typing Ctrl-D (EOF on macOS).
I'm a bit confused about the behaviour of getchar(), especially when compared to read.
Shouldn't the read system calls inside the while loop also return EOF? Why do they prompt the user? Has some sort of stdin clear occurred?
Considering that getchar() uses the read system call under the hood how come they behave differently? Shouldn't the stdin be "unique" and the EOF condition shared?
How come in the following code the two read system calls return both EOF when a Ctrl-D input is given?
int res;
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
I'm trying to find a logic behind all this. Hope that someone could make it clear what EOF really is a how it really behaves.
P.S I'm using a Mac OS machine
Once the end-of-file indicator is set for stdin, getchar() does not attempt to read.
Clear the end-of-file indicator (e.g. clearerr() or others) to re-try reading.
The getchar function is equivalent to getc with the argument stdin.
The getc function is equivalent to fgetc ...
If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).
read() still tries to read each time.
Note: Reading via a FILE *, like stdin, does not attempt to read if the end-of-file indicator is set. Yet even if the error indicator is set, a read attempt still occurs.
MacOs is a derivative of BSD unix systems. Its stdio implementation does not come from GNU software and so it is a different implementation. On EOF, the file descriptor is marked as erroneous when issuing a read(2) system call and receiving 0 as the number of characters returned by read, and so, it doesn't read(2) it again until the error condition is reset, and this produces the behaviour you observe. Use clearerr(stream); on the FILE * descriptor before issuing the next getchar(3) call, and everything will be fine. You can do that with glib also, and then, your program will run the same in either implementation of stdio (glib vs. bsd)
I'm trying to find a logic behind all this. Hope that someone could make it clear what EOF really is a how it really behaves.
EOF is simply a constant (normally it's valued as -1) that is different to any possible char value returned by getchar(3) (getchar() returns an int in the interval 0..255, and not a char for this purpose, to extend the range os possible characters with one more to represent the EOF condition, but EOF is not a char) The end of file condition is so indicated by the getchar family of functions (getchar, fgetc, etc) as the end of file condition is signalled by a read(2) return value of 0 (the number of returned characters is zero) which doesn't map as some character.... for that reason, the number of possible chars is extended to an integer and a new value EOF is defined to be returned when the end of file condition is reached. This is compatible with files that have Ctrl-D characters (ASCII EOT or Cntrl-D, decimal value 4) and not representing an END OF FILE condition (when you read an ASCII EOT from a file it appears as a normal character of decimal value 4)
The unix tty implementation, on the other side, allows on line input mode to use a special character (Ctrl-D, ASCII EOT/END OF TRANSMISSION, decimal value 4) to indicate and end of stream to the driver.... this is a special character, like ASCII CR or ASCII DEL (that produce line editing in input before feeding it to the program) in that case the terminal just prepares all the input characters and allows the application to read them (if there's none, none is read, and you got the end of file) So think that the Cntrl-D is only special in the unix tty driver and only when it is working in canonical mode (line input mode). So, finally, there are only two ways to input data to the program in line mode:
pressing the RETURN key (this is mapped by the terminal into ASCII CR, which the terminal translates into ASCII LF, the famous '\n' character) and the ASCII LF character is input to the program
pressing the Ctrl-D key. this makes the terminal to grab all that was keyed in upto this moment and send it to the program (without adding the Ctrl-D itself) and no character is added to the input buffer, what means that, if the input buffer was empty, nothing is sent to the program and the read(2) call reads effectively zero characters from the buffer.
To unify, in every scenario, the read(2) system call normally blocks into the kernel until one or more characters are available.... only at end of file, it unblocks and returns zero characters to the program. THIS SHOULD BE YOUR END OF FILE INDICATION. Many programs read an incomplete buffer (less than the number of characters you passed as parameter) before a true END OF FILE is signalled, and so, almost every program does another read to check if that was an incomplete read or indeed it was an end of file indication.
Finally, what if I want to input a Cntrl-D character as itself to a file.... there's another special character in the tty implementation that allows you to escape the special behaviour on the special character this one precedes. In today systems, that character is by default Ctrl-V, so if you want to enter a special character (even ?Ctrl-V) you have to precede it with Ctrl-V, making entering Ctrl-D into the file to have to input Ctrl-V + Ctrl-D.
Say I make an input :
"Hello world" // hit a new line
"Goodbye world" // second input
How could I scan through the two lines and input them separately in two different arrays. I believe I need to use getchar until it hits a '\n'. But how do I scan for the second input.
Thanks in advance. I am a beginner in C so please It'd be helpful to do it without pointers as I haven't covered that topic.
Try this code out :
#include<stdio.h>
int main(void)
{
int flx=0,fly=0;
char a,b[10][100];
while(1)
{
a=getchar();
if(a==EOF) exit(0);
else if(a=='\n')
{
flx++;
fly=0;
}
else
{
b[flx][fly++]=a;
}
}
}
Here I use a two dimensional array to store the strings.I read the input character by character.First i create an infinite loop which continues reading characters.If the user enters the end of File character the input stops. If there is a newline character then flx variable is incremented and the next characters are stored in the next array position.You can refer to the strings stored with b[n] where n is the index.
The function that you should probably look at is fgets. At least on my system, the definition is as follows:
char *fgets(char * restrict str, int size, FILE * restrict stream);
So a very simple program to read input from the keyboard would run something like this:
#include <stdio.h>
#include <stdlib.h>
#define MAXSTRINGSIZE 128
int main(void)
{
char array[2][MAXSTRINGSIZE];
int i;
void *result;
for (i = 0; i < 2; i++)
{
printf("Input String %d: ", i);
result = fgets(&array[i][0], MAXSTRINGSIZE, stdin);
if (result == NULL) exit(1);
}
printf("String 1: %s\nString 2: %s\n", &array[0][0], &array[1][0]);
exit(0);
}
That compiles and runs correctly on my system. The only issue with fgets though is that is retains the newline character \n in the string. So if you don't want that, you will need to remove it. As for the *FILE parameter, stdin is a predefined *FILE structure that indicates standard input, or file descriptor 0. There are also stdout for standard output (file descriptor 1) and a stderr for error messages and diagnostics (file descriptor 2). The file descriptor numbers correspond to the ones used in a shell like so:
$$$-> cat somefile > someotherfile 2>&1
What that does is take outfile of file descriptor 2 and redirect it to 1 with 1 in turn being redirected to a file. In addition, I am using the & operator because we are addressing parts of an array, and the functions in question (fgets, printf) require pointers. As for the result, the man page for gets and fgets states the following:
RETURN VALUES
Upon successful completion, fgets() and gets() return a pointer to the string. If end-of-file occurs before any characters are read,
they return NULL and the buffer contents remain unchanged. If an
error occurs, they return NULL and the buffer contents are
indeterminate. The fgets() and gets() functions do not distinguish
between end-of-file and error, and callers must use feof(3) and
ferror(3) to determine which occurred.
So to make your code more robust, if you get a NULL result, you need to check for errors using ferror or end of file using feof and respond approperiately. Furthermore, never EVER use gets. The only way that you can use it securely is that you have to have the ability to see into the future, which clearly nobody can do so it cannot be used securely. It will just open you up for a buffer overflow attack.
Consider the following, albeit very messy, code in C:
#include<stdio.h>
int main() {
char buf[3]; //a new, small buffer
FILE *fp = fopen("test.txt", "r"); //our test file, with the contents "123abc"
setvbuf(fp, buf, _IOFBF, 2); //we assign our small buffer as fp's buffer \
//in fully buffered mode
char character = fgetc(fp); // get the first character...
character = fgetc(fp); // and the next...
character = fgetc(fp); // and the next... (third character, '3')
buf[2] = '\0'; //add a terminating line for display
fputs(buf, stderr); //write our buffer to stderr, should show up immediately
}
Compiling and running the code will print '3a' as the contents of our self-designated buffer, buf. My question is: how does this buffer get filled? Does a call to fgetc() mean several calls until the buffer is full and then stops (we only made three calls to fgetc, which should not include the present 'a')? The first buffer was "12", so does this mean when another fgetc() call is made and the pointer references something outside of the scope of the buffer, is the buffer purged and then filled with the next block of data, or simply overwritten? I understand buffer sizes are platform dependent so I'm more concerned with how, in general, an fopen()ed stream in a read mode pulls characters into it's buffer.
The buffer, and exactly how and when it is filled, is an implementation detail inside the stdio package. But the way it is likely to be implemented is that fgetc gets one character from the buffer, if there are characters available in the buffer. If the buffer is empty, it fills it by reading (in your case) two more characters from the file.
So your first fgetc will read 12 from the file and put it in the buffer, and then return '1'. Your second fgetc will not read from the file, since a character is available in the buffer, and return '2'. Your third fgetc will find that the buffer is empty, so it will read 3a from the file and put it in the buffer, and then return '3'. Therefore, when you print the content of the buffer, it will be 3a.
Note that there are two levels of "reading" happening here. First you have your fgetc calls, and then, below that level, code inside the stdio packade which is reading from the file. If we assume this is on a Unix or Linux system, the second type of reading is done using the system call read(2).
The lower-level reading fills the entire buffer at once, so you don't need as many calls to read as calls to fgetc. (Which is the entire point of having the buffer.)
I am having a confusion regarding the following code,
#include<stdio.h>
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
read(stdin,buf,4); //attaching to stdin
printf("buffer is %s\n",buf);
return 1;
}
suppose on runtime I provided with the input 10abcd so as per my understanding following should happen:
scanf should place 10 in data
and abcd will still be on the stdin buffer
when read tries to read the stdin (already abcd is there) it should place the abcd into the buf
so printf should print abcd
but it is not happening ,printf showing no o/p
am I missing something here?
First of all read (stdin, ...) should give warnings (if you have them enabled) which you would be wise to heed. read() takes an integer as the first parameter specifying which channel to read from. stdin is of type FILE *.
Even if you changed it to read(0,..., this is not recommended practice. scanf is reading from FILE *stdin which is buffered from file handle 0. read (0, ...) reads directly from the underlying file handle and ignore any characters which were buffered. This will cause strange results unless stdin is set unbuffered.
Ignoring mechanical issues related to the syntax of the read() function call, there are two cases to consider:
Input is from a terminal.
Input is from a file.
Terminal
No data will be available for reading until the user hits return. At that point, the standard I/O library will read all the available data into the buffer associated with stdin (that would be "10abcd\n"). It will then parse the number, leaving the a in the buffer to be read later by other standard I/O functions.
When the read() occurs, it will also wait for the user to provide some input. It has no clue about the data in the stdin buffer. It will hang until the user hits return, and will then read the next lot of data, returning up to 4 bytes in the buffer (no null termination unless it so happens that the fourth character is an ASCII NUL '\0').
File
Actually, this isn't all that much different, except that instead of reading a line of data into the buffer, the standard I/O library will probably read an entire buffer full, (BUFSIZ bytes, which might be 512 or larger). It will then convert the 10 and leave the a for later use. (If the file is shorter than the buffer size, it will all be read into the stdin buffer.)
The read will then collect the next 4 bytes from the file. If the whole file was read already, then it will return nothing — 0 bytes read.
You need to record and check the return value from read(). You should also check the return value from scanf() to ensure it did actually read a number.
try... man read first.
read is declared as ssize_t read(int fd, void *buf, size_t count);
and stdin is declared as FILE *. thats the issue. use fread() instead and you will be sorted.
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
fread(buf, 1, 4, stdin);
printf("buffer is %s\n",buf);
return 1;
}
EDIT: Your understanding is almost correct but not totally.
To address your question properly, i will agree with Jonathen Laffer.
how your code works,
1) scanf should place 10 in data.
2) abcd will still be on the stdin buffer when you press ENTER.
3) then read() will again wait for entry and you have to again press ENTER to run program further.
4)now if you have entered anything before pressing ENTER for 2nd time the printf should print it else you will not get anything on output other than your printf statement.
Thats why i asked you to use fread instead. hope it helps.
/*
Low Level I/O - Read and Write
Chapter 8 - The C Programming Language - K&R
Header file in the original code is "syscalls.h"
Also BUFSIZ is supposed to be defined in the same header file
*/
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#define BUFSIZ 1
int main() /* copy input to output */
{
char buf[BUFSIZ];
int n;
while ((n = read(0, buf, BUFSIZ)) > 0)
write(1, buf, n);
return 0;
}
When I feed "∂∑∑®†¥¥¥˚π∆˜˜∫∫√ç tu 886661~EOF" as input the same is copied.
How so many non ASCII characters are stored at the same time?
BUFSIZ is number of bytes to be transferred.
How is BUFSIZ limiting byte transfer if for any value, anything can be copied from input to output?
How char buf[BUFSIZ] is storing non-ASCII characters ?
You read by little chunks until EOF:
while ((n = read(0, buf, BUFSIZ)) > 0)
That's why. You literally, byte by byte, copy input to output. How convert it back to unicode, is problem of console, not your. I guess, It do not output anything until it can recognize data as symbol.
Since you are calling read in a loop until 'end of file' is reached on an error in encountered, you are getting precisely 1 character in buf after each call of read. After that that character is printed via write system call. It is guaranteed that read system call will read no more than it's specified in the last argument. If you pass 10, for example, in your case, read will go ahead and try to copy the data read beyond the array bounds.
As for the characters you have fed - these seem to be extended ASCII characters (codes 128-255), so no problem here.
When you call read from standard input you are reading from the pipe, that bound to terminal or to another program. Of course there is a buffer(s) between writer (terminal or other program) and your program. When this buffer is underflow reader (your program) is blocking on read. When the buffer is overflow than writer (terminal etc) in blocking on write and vice versa.
When you write to the standard output you writing to the pipe, that bound to terminal or to another program.
So if your program is run by the shell from the terminal, than your program input and output is bound to the (pseudo)terminal. (Pseudo)terminal is program that can convert user's key presses to the characters and convert some encoded strings (ISO8859-1, UTF-8 etc) to the symbols on the screen.
Characters are stored in the terminal program before you press the EOF of EOL. This is canonical mode of the terminal. After your press enter the bytes are wrote to the pipe bound to your program.
BUFSIZ is number of bytes that you trying to read from the input per one operation. n return value is number of bytes that really have read when operation complete. So BUFSIZ is maximum bytes that can be read by your program from the pipe.
char buf[BUFSIZ] is array of bytes (not the characters of some charset), so it can handle any values (including non-printable and even zero).