unix Read & Write function - c

/*
Low Level I/O - Read and Write
Chapter 8 - The C Programming Language - K&R
Header file in the original code is "syscalls.h"
Also BUFSIZ is supposed to be defined in the same header file
*/
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#define BUFSIZ 1
int main() /* copy input to output */
{
char buf[BUFSIZ];
int n;
while ((n = read(0, buf, BUFSIZ)) > 0)
write(1, buf, n);
return 0;
}
When I feed "∂∑∑®†¥¥¥˚π∆˜˜∫∫√ç tu 886661~EOF" as input the same is copied.
How so many non ASCII characters are stored at the same time?
BUFSIZ is number of bytes to be transferred.
How is BUFSIZ limiting byte transfer if for any value, anything can be copied from input to output?
How char buf[BUFSIZ] is storing non-ASCII characters ?

You read by little chunks until EOF:
while ((n = read(0, buf, BUFSIZ)) > 0)
That's why. You literally, byte by byte, copy input to output. How convert it back to unicode, is problem of console, not your. I guess, It do not output anything until it can recognize data as symbol.

Since you are calling read in a loop until 'end of file' is reached on an error in encountered, you are getting precisely 1 character in buf after each call of read. After that that character is printed via write system call. It is guaranteed that read system call will read no more than it's specified in the last argument. If you pass 10, for example, in your case, read will go ahead and try to copy the data read beyond the array bounds.
As for the characters you have fed - these seem to be extended ASCII characters (codes 128-255), so no problem here.

When you call read from standard input you are reading from the pipe, that bound to terminal or to another program. Of course there is a buffer(s) between writer (terminal or other program) and your program. When this buffer is underflow reader (your program) is blocking on read. When the buffer is overflow than writer (terminal etc) in blocking on write and vice versa.
When you write to the standard output you writing to the pipe, that bound to terminal or to another program.
So if your program is run by the shell from the terminal, than your program input and output is bound to the (pseudo)terminal. (Pseudo)terminal is program that can convert user's key presses to the characters and convert some encoded strings (ISO8859-1, UTF-8 etc) to the symbols on the screen.
Characters are stored in the terminal program before you press the EOF of EOL. This is canonical mode of the terminal. After your press enter the bytes are wrote to the pipe bound to your program.
BUFSIZ is number of bytes that you trying to read from the input per one operation. n return value is number of bytes that really have read when operation complete. So BUFSIZ is maximum bytes that can be read by your program from the pipe.
char buf[BUFSIZ] is array of bytes (not the characters of some charset), so it can handle any values (including non-printable and even zero).

Related

getchar() keeps returning EOF even after subsequent calls but read() system calls seem to "clear" the stdin. What are the reasons behind this?

char buff[1];
int main() {
int c;
c = getchar();
printf("%d\n", c); //output -1
c = getchar();
printf("%d\n", c); // output -1
int res;
//here I get a prompt for input. What happened to EOF ?
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
return 0;
}
The resulting output showed with commented lines in the code is the result of simply typing Ctrl-D (EOF on macOS).
I'm a bit confused about the behaviour of getchar(), especially when compared to read.
Shouldn't the read system calls inside the while loop also return EOF? Why do they prompt the user? Has some sort of stdin clear occurred?
Considering that getchar() uses the read system call under the hood how come they behave differently? Shouldn't the stdin be "unique" and the EOF condition shared?
How come in the following code the two read system calls return both EOF when a Ctrl-D input is given?
int res;
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
I'm trying to find a logic behind all this. Hope that someone could make it clear what EOF really is a how it really behaves.
P.S I'm using a Mac OS machine
Once the end-of-file indicator is set for stdin, getchar() does not attempt to read.
Clear the end-of-file indicator (e.g. clearerr() or others) to re-try reading.
The getchar function is equivalent to getc with the argument stdin.
The getc function is equivalent to fgetc ...
If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).
read() still tries to read each time.
Note: Reading via a FILE *, like stdin, does not attempt to read if the end-of-file indicator is set. Yet even if the error indicator is set, a read attempt still occurs.
MacOs is a derivative of BSD unix systems. Its stdio implementation does not come from GNU software and so it is a different implementation. On EOF, the file descriptor is marked as erroneous when issuing a read(2) system call and receiving 0 as the number of characters returned by read, and so, it doesn't read(2) it again until the error condition is reset, and this produces the behaviour you observe. Use clearerr(stream); on the FILE * descriptor before issuing the next getchar(3) call, and everything will be fine. You can do that with glib also, and then, your program will run the same in either implementation of stdio (glib vs. bsd)
I'm trying to find a logic behind all this. Hope that someone could make it clear what EOF really is a how it really behaves.
EOF is simply a constant (normally it's valued as -1) that is different to any possible char value returned by getchar(3) (getchar() returns an int in the interval 0..255, and not a char for this purpose, to extend the range os possible characters with one more to represent the EOF condition, but EOF is not a char) The end of file condition is so indicated by the getchar family of functions (getchar, fgetc, etc) as the end of file condition is signalled by a read(2) return value of 0 (the number of returned characters is zero) which doesn't map as some character.... for that reason, the number of possible chars is extended to an integer and a new value EOF is defined to be returned when the end of file condition is reached. This is compatible with files that have Ctrl-D characters (ASCII EOT or Cntrl-D, decimal value 4) and not representing an END OF FILE condition (when you read an ASCII EOT from a file it appears as a normal character of decimal value 4)
The unix tty implementation, on the other side, allows on line input mode to use a special character (Ctrl-D, ASCII EOT/END OF TRANSMISSION, decimal value 4) to indicate and end of stream to the driver.... this is a special character, like ASCII CR or ASCII DEL (that produce line editing in input before feeding it to the program) in that case the terminal just prepares all the input characters and allows the application to read them (if there's none, none is read, and you got the end of file) So think that the Cntrl-D is only special in the unix tty driver and only when it is working in canonical mode (line input mode). So, finally, there are only two ways to input data to the program in line mode:
pressing the RETURN key (this is mapped by the terminal into ASCII CR, which the terminal translates into ASCII LF, the famous '\n' character) and the ASCII LF character is input to the program
pressing the Ctrl-D key. this makes the terminal to grab all that was keyed in upto this moment and send it to the program (without adding the Ctrl-D itself) and no character is added to the input buffer, what means that, if the input buffer was empty, nothing is sent to the program and the read(2) call reads effectively zero characters from the buffer.
To unify, in every scenario, the read(2) system call normally blocks into the kernel until one or more characters are available.... only at end of file, it unblocks and returns zero characters to the program. THIS SHOULD BE YOUR END OF FILE INDICATION. Many programs read an incomplete buffer (less than the number of characters you passed as parameter) before a true END OF FILE is signalled, and so, almost every program does another read to check if that was an incomplete read or indeed it was an end of file indication.
Finally, what if I want to input a Cntrl-D character as itself to a file.... there's another special character in the tty implementation that allows you to escape the special behaviour on the special character this one precedes. In today systems, that character is by default Ctrl-V, so if you want to enter a special character (even ?Ctrl-V) you have to precede it with Ctrl-V, making entering Ctrl-D into the file to have to input Ctrl-V + Ctrl-D.

How to control the output of fileno?

I'm facing a piece of code that I don't understand:
read(fileno(stdin),&i,1);
switch(i)
{
case '\n':
printf("\a");
break;
....
I know that fileno return the file descriptor associated with the sdtin here, then read put this value in i variable.
So, what should be the value of stdin to allow i to match with the first "case", i.e \n ?
Thank you
But what should be the value of stdin to match with the first "case", i.e \n ?
The case statement doesn't look at the "value" of stdin.
read(fileno(stdin),&i,1);
reads in a single byte into i (assuming read() call is successful) and if that byte is \n (newline character) then it'll match the case. You probably need to read the man page of read(2) to understand what it does.
I know that fileno return the file descriptor associated with the sdtin here,
Yes, though I suspect you don't know what that means.
then read put this value in i variable.
No. No no no no no. read() does not put the value of the file descriptor, or any part of it, into the provided buffer (in your case, the bytes of i). As its name suggests, read() attempts to read from the file represented by the file descriptor passed as its first argument. The bytes read, if any, are stored in the provided buffer.
stdin represents the program's standard input. If you run the program from an interactive shell, that will correspond to your keyboard. The program attempts to read user input, and to compare it with a newline.
The program is likely flawed, and maybe outright wrong, though it's impossible to tell from just the fragment presented. If i is a variable of type int then its representation is larger than one byte, but you're only reading one byte into it. That will replace only one byte of the representation, with results depending on C implementation and the data read.
What the program seems to be trying to do can be made to work with read(), but I would recommend using getchar() instead:
#include <stdio.h>
/*
...
int i;
...
*/
i = getchar();
/* ... */

Print character to stdin as read by read line

I want to print every character that is read by readline as it's read in.
Currently I'm able to obviously print out everything after it's fully read in.
This is for a shell that is being written.
readline() is a blocking function and wait for a complete line.
Just use read() in a loop. If there is no new character read() will return 0 and if a character is present it will return 1 (or n if n characters are in stdin since last call to read() and you provide a buffer big enough).
Generally the standard input has the file descriptor 0 reserved for it so your code may be like :
char buf[1];
int i;
while(1) {
i = read(0, buf, 1);
if (i>0)
{
// process character
}
}

fully-buffered stream get flushed when it is not full

I really confused with how exactly a buffer work. So I write a little snippet to verify:
#include<stdio.h>
#define BUF_SIZE 1024
char buf[BUF_SIZE];
char arr[20];
int main()
{
FILE* fs=fopen("test.txt","r");
setvbuf(fs,buf,_IOFBF,1024);
fread(arr,1,1,fs);
printf("%s",arr);
getchar();
return 0;
}
As you see, I set the file stream fs to fully buffered stream(I know most of the time it would default to fully-buffered. just making sure). And I also set its related buffer to be size 1024, which mean that the stream would not be flushed until it contain 1024 bytes of stuff(right?).
In my opinion, the routine of fread() is that, it read data from the file stream, store it at its buffer buf,and then the data in the buf would be send to the arr as soon as it is full of 1024 bytes of data(right?).
But now, I read only one character from the stream!!And also, there is are only four characters in the file test.txt. why can I find something in the arr in case that there is only one char(I can print that one character out)
The distinctions between fully-buffered, line-buffered, and unbuffered really only matter for output streams. I'm pretty sure that input streams are pretty much always act like they're fully buffered.
But even for fully-buffered input streams, there's at least one case where the buffer won't be fully full, and as you've discovered, that's where there aren't enough characters left in the input to fill the buffer. If there are only 4 characters in the file, then when the system goes to fill the buffer, it gets those 4 characters and puts them in the buffer, and then you can start taking them out, as usual.
(The same situation would arise any time the file contains a number of characters that's not an exact multiple of the buffer size. For example, if the input file contained 1028 characters, then after filling the buffer with the first 1024 characters and letting you read them, the next time it filled the buffer, it'd end up with 4 again.)
What were you expecting it to do in this case? Block waiting to read 1,020 more characters from the file (that were never going to come)?
P.S. You said "the stream would not be flushed until it contained 1024 bytes of stuff, right?" But flushing is only defined for output streams; it doesn't mean anything for input streams.
From what I understand, an input buffer works different to what you suggested: if you request one Byte to be read, the system reads 1023 more Bytes into the buffer, so on the next 1023 subsequent read calls it can return data directly from the buffer instead of having to read from the file.

using read system call after a scanf

I am having a confusion regarding the following code,
#include<stdio.h>
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
read(stdin,buf,4); //attaching to stdin
printf("buffer is %s\n",buf);
return 1;
}
suppose on runtime I provided with the input 10abcd so as per my understanding following should happen:
scanf should place 10 in data
and abcd will still be on the stdin buffer
when read tries to read the stdin (already abcd is there) it should place the abcd into the buf
so printf should print abcd
but it is not happening ,printf showing no o/p
am I missing something here?
First of all read (stdin, ...) should give warnings (if you have them enabled) which you would be wise to heed. read() takes an integer as the first parameter specifying which channel to read from. stdin is of type FILE *.
Even if you changed it to read(0,..., this is not recommended practice. scanf is reading from FILE *stdin which is buffered from file handle 0. read (0, ...) reads directly from the underlying file handle and ignore any characters which were buffered. This will cause strange results unless stdin is set unbuffered.
Ignoring mechanical issues related to the syntax of the read() function call, there are two cases to consider:
Input is from a terminal.
Input is from a file.
Terminal
No data will be available for reading until the user hits return. At that point, the standard I/O library will read all the available data into the buffer associated with stdin (that would be "10abcd\n"). It will then parse the number, leaving the a in the buffer to be read later by other standard I/O functions.
When the read() occurs, it will also wait for the user to provide some input. It has no clue about the data in the stdin buffer. It will hang until the user hits return, and will then read the next lot of data, returning up to 4 bytes in the buffer (no null termination unless it so happens that the fourth character is an ASCII NUL '\0').
File
Actually, this isn't all that much different, except that instead of reading a line of data into the buffer, the standard I/O library will probably read an entire buffer full, (BUFSIZ bytes, which might be 512 or larger). It will then convert the 10 and leave the a for later use. (If the file is shorter than the buffer size, it will all be read into the stdin buffer.)
The read will then collect the next 4 bytes from the file. If the whole file was read already, then it will return nothing — 0 bytes read.
You need to record and check the return value from read(). You should also check the return value from scanf() to ensure it did actually read a number.
try... man read first.
read is declared as ssize_t read(int fd, void *buf, size_t count);
and stdin is declared as FILE *. thats the issue. use fread() instead and you will be sorted.
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
fread(buf, 1, 4, stdin);
printf("buffer is %s\n",buf);
return 1;
}
EDIT: Your understanding is almost correct but not totally.
To address your question properly, i will agree with Jonathen Laffer.
how your code works,
1) scanf should place 10 in data.
2) abcd will still be on the stdin buffer when you press ENTER.
3) then read() will again wait for entry and you have to again press ENTER to run program further.
4)now if you have entered anything before pressing ENTER for 2nd time the printf should print it else you will not get anything on output other than your printf statement.
Thats why i asked you to use fread instead. hope it helps.

Resources