Unistd read() maximum size

Unistd read() maximum size - c

In the following snippet, no matter how long of an input I put in (EDIT: I'm copy and pasting in a random string), say a string with 9998 characters, read() stops when i = 4095. It states it read in an EOF character, but my string does not have an EOF character (for example I tried a string of 9998 'a's). The return value also suggests there is no error from read(). Why does read() only read in only 4095 bytes?
#include <unistd.h>
#include <stdio.h>
int main() {
char temp;
char buf[10000];
int i = 0;
while(read(STDIN_FILENO, &temp, 1) > 0) {
buf[i] = temp;
i++;
}
printf("%d\n", i);
}
Edit: To clarify, read() doesn't literally state that it read in an EOF character, per https://linux.die.net/man/2/read read() returns 0 when it moves past the EOF.

You're most likely seeing the terminal buffer limit -- terminals can only read a limited number of characters on a single line, and if you type in more than that (or simulate typing with a pseudo-terminal or cut-n-paste) without entering an NL, EOL, or EOL2 character, you'll get an error, which the terminal indicates with an EOF (read returning 0).
You can generally avoid this problem by putting the terminal into non-canonical mode (where it doesn't try to buffer lines to allow backspacing).

Related

Does fgets() hold somehow where it stopped reading from a FILE *?

I am trying to get a sample (shell script) program on how to write to a file:
#include <unistd.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv){
char buff[1024];
size_t len, idx;
ssize_t wcnt;
for (;;){
if (fgets(buff,sizeof(buff),stdin) == NULL)
return 0;
idx = 0;
len = strlen(buff);
do {
wcnt = write(1,buff + idx, len - idx);
if (wcnt == -1){ /* error */
perror("write");
return 1;
}
idx += wcnt;
} while (idx < len);
}
}
So my problem is this: Let's say I want to write a file of 20000 bytes so every time I can only write (at most) 1024 (buffer size).
Let's say that in my first attempt everything is going perfect and fgets() reads 1024 bytes and in my first do while I write 1024 bytes.
Then, since we wrote "len" bytes we exit the do-while loop.
So now what?? The buffer is full from our previous reading. It seems to me that for some reason it is implied that fgets() will now continue reading from the point it reached in in-file the last time. (buf[1024] here).
How come, fgets() knows where it stopped reading in the in-file?
I checked the man page :
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored in the buffer. A terminating null byte (aq\0aq) is stored after the last character in the buffer.
fgets() return s on success, and NULL on error or when the end of file occurs while no characters have been read.*
So from that, I get that it returns a pointer to the first element of buf, which is always buf[0],
that's why I am confused.

When using aFILE stream, it contains information about the position in the file (among other things). fgets and other functions like freador fwrite merely utilize this information and updates it when an operation is performed.
So, whenever fgets reads from the stream, the stream will be updated to maintain the position, so that the next operation starts off where the previous ended.

The strange behavior of 'read' system function

The programm I tried writing should have been able to read a string of a length not longer than 8 characters and check if such string were present in the file. I decided to use 'read' system function for it, but I've come up with a strange behavior of this function. As it's written in manual, it must return 0 when the end of file is reached, but in my case when there were no more characters to read it still read a '\n' and returned 1 (number of bytes read) (I've checked the ASCII code of the read character and it's actually 10 which is of '\n'). So considering this fact I changed my code and it worked, but I still can't understand why does it behave in this way. Here is the code of my function:
int is_present(int fd, char *string)
{
int i;
char ch, buf[9];
if (!read(fd, &ch, 1)) //file is empty
return 0;
while (1) {
i = 0;
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
buf[i] = '\0';
if (!strncmp(string, buf, strlen(buf))) {
close(fd);
return 1;
}
if(!read(fd, &ch, 1)) //EOF reached
break;
}
close(fd);
return 0;
}

I think that your problem is in the inner read() call. There you are not checking the return of the function.
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
If the file happens to be at EOF when entering the function and ch equals '\n' then it will be an infinite loop, because read() will not modify the ch value. BTW, you are not checking the bounds of buf.

I'm assuming the question is 'why does read() work this way' and not 'what is wrong with my program?'.
This is not an error. From the manual page:
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
If you think about it read must work this way. If it returned 0 to indicate an end of file was reached when some data had been read, you would have no idea how much data had been read. Therefore read returns 0 only when no data is read because of an end-of-file condition.
Therefore in this case, where there is only a \n available, read() will succeed and return 1. The next read will return a zero to indicate end of file.

The read() function unless it finds a EOF keeps reading characters and places it on the buffer. here in this case \n is also considered as a character. hence it reads that also. Your code would have closed after it read the \n as there was nothing else other than EOF . So only EOF is the delimiter for the read() and every other character is considered normal. Cheers!

Extra EOF character

I have a program that reads a file into a buffer structure. The problem I'm having is that when I look at the output of the file, there's an extra EOF character at the end. Ill post the related functions:(NOTE: I removed parameter checks and only posted code in the function related to the issue)
b_load
int b_load(FILE * const fi, Buffer * const pBD){
unsigned char character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars = 0; /*Counter of the amount of characters read into the buffer*/
/*Assigns main Buffer to tempBuffer*/
tempBuffer = pBD;
/*Infinite loop that breaks after EOF is read*/
while(1){
/*calls fgetc() and returns the char into the character variable*/
character = (unsigned char)fgetc(fi);
if(!feof(fi)){
tempBuffer = b_addc(pBD,character);
if(tempBuffer == NULL)
return LOAD_FAIL;
++num_chars;
}else{
break;
}
}
return num_chars;
}
b_print
int b_print(Buffer * const pBD){
int num_chars = 0;
if(pBD->addc_offset == 0)
printf("The buffer is empty\n");
/*Sets getc_offset to 0*/
b_set_getc_offset(pBD, 0);
pBD->eob=0;
/*b_eob returns the structures eob field*/
while (!b_eob(pBD)){
printf("%c",b_getc(pBD));
++num_chars;
}
printf("\n");
return num_chars;
}
b_getc
char b_getc(Buffer * const pBD){
if(pBD->getc_offset == pBD->addc_offset){
pBD->eob = 1;
return R_FAIL_1;
}
pBD->eob = 0;
return pBD->ca_head[(pBD->getc_offset)++];
}
at the end I end up with:
"a catÿ"
(the y is the EOF character)
It prints an EOF character but is never added to the buffer. When the driver code adds an EOF character to the end of the buffer, 2 appear. Any idea what is causing this? I might be using feof() wrong so that may be it, but it is required in the code

There is no "EOF character". EOF is a value returned by getchar() and related functions to indicate that they have no more input to read. It's a macro that expands to a negative integer constant expression, typically (-1).
(For Windows text files, an end-of-file condition may be triggered by a Control-Z character in a file. If you read such a file in text mode, you won't see that character; it will just act like it reached the end of the file at that point.)
Don't use the feof() function to detect that there's no more input to read. Instead, look at the value returned by whatever input function you're using. Different input functions use different ways to indicate that they weren't able to read anything; read the documentation for whichever one you're using. For example, fgets() returns a null pointer, getchar() returns EOF, and scanf() returns the number of items it was able to read.
getchar(), for example, returns either the character it just read (treated as an unsigned char and converted to int) or the value EOF to indicate that it wasn't able to read anything. The negative value of EOF is chosen specifically to avoid colliding with any valid value of type unsigned char. Which means you need to store the value returned by getchar() in an int object; if you store it in a char or unsigned char instead, you can lose information, and an actual character with the value 0xff can be mistaken for EOF.
The feof() function returns the value of the end-of-file indicator for the file you're reading from. That indicator becomes true after you've tried and failed to read from the file. And if you ran out of input because of an error, rather than because of an end-of-file condition, feof() will never become true.
You can use feof() and/or ferror() to determine why there was no more input to be read, but only after you've detected it by other means.
Recommended reading: Section 12 of the comp.lang.c FAQ, which covers stdio. (And the rest of it.)
UPDATE :
I haven't seen enough of your code to understand what you're doing with the Buffer objects. Your input look actually looks (almost) correct, though it's written in a clumsy way.
The usual idiom for reading characters from a file is:
int c; /* `int`, NOT `char` or `unsigned char` */
while ((c = fgetc(fi)) != EOF) {
/* process character in `c` */
}
But your approach, which I might rearrange like this:
while (1) {
c = fgetc(fi);
if (feof(fi) || ferror(fi)) {
/* no more input */
break;
}
/* process character in c */
}
should actually work. Note that I've added a check for ferror(f1). Could it be that you have an error on input (which you're not detecting)? That would cause c to contain EOF, or the value of EOF converted to the type of c. That's doubtful, though, since it would probably give you an infinite loop.
Suggested approach: Using either an interactive debugger or added printf calls, show the value of character every time through the loop. If your input loop is working correctly, then build a stripped-down version of your program with a hard-wired sequence of calls to b_addc(), and see if you can reproduce the problem that way.

There you go ...
int b_load(FILE * const fi, Buffer * const pBD){
int character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars ; /*Counter of the amount of characters read into the buffer*/
/*Infinite loop that breaks WHEN EOF is read*/
while(num_chars = 0; 1; num_chars++ ) {
character = fgetc(fi);
if (character == EOF || feof(fi)) break; // since you insist on the silly feof() ...
tempBuffer = b_addc(pBD, (unsigned char) character);
if(tempBuffer == NULL) return LOAD_FAIL;
}
}
return num_chars;
}

getchar() function in C: why it won't print a character right after I use getchar?

I am just a beginner in programming. And I am learning the K&R's book, the C Programming Language. While I am reading, I become more and more curious about this question --
when there is a loop to get characters one by one from the input and I put an outputing function in the loop, whose result I thought would be like print each character right after it had been entered. However, the result seems like the computer will only print out a whole package of characters after I tap a key.
Such as the answer of exercise 1-22 from K&R's book:
/* K&R Exercise 1-22 p.34
*
* Write a program to "fold" long input lines into two or more
* shorter lines after the last non-blank character that occurs before the n-th
* column of input. Make sure your program does something intelligent with very
* long lines, and if there are no blanks or tabs before the specified column.
*/
#include <stdio.h>
#define LINE_LENGTH 80
#define TAB '\t'
#define SPACE ' '
#define NEWLINE '\n'
void entab(int);
int main()
{
int i, j, c;
int n = -1; /* The last column with a space. */
char buff[LINE_LENGTH + 1];
for ( i=0; (c = getchar()) != EOF; ++i )
{
/* Save the SPACE to the buffer. */
if ( c == SPACE )
{
buff[i] = c;
}
/* Save the character to the buffer and note its position. */
else
{
n = i;
buff[i] = c;
}
/* Print the line and reset counts if a NEWLINE is encountered. */
if ( c == NEWLINE )
{
buff[i+1] = '\0';
printf("%s", buff);
n = -1;
i = -1;
}
/* If the LINE_LENGTH was reached instead, then print up to the last
* non-space character. */
else if ( i == LINE_LENGTH - 1 )
{
buff[n+1] = '\0';
printf("%s\n", buff);
n = -1;
i = -1;
}
}
}
I supposed the program would turn out to be like, it would print out only one line of characters, whose length is 80, right after I entered just 80 characters (and I haven't tapped an ENTER key yet). However, it doesn't show up that way! I can totally enter the whole string no matter how many characters there are. When I finally decide to finish the line, I just tap ENTER key, and it will give me the right outputs: the long string is cut into several short pieces/lines, which have 80 characters (and of course the last one may contain less than 80 characters).
I wonder WHY does that happened?

Usually (and in your case), stdin is line-buffered, so your programme doesn't receive the characters as they are typed, but in chunks, when the user enters a newline (Return), or, possibly, when the system buffer is full.
So when the user input is finally sent to your programme, it is copied into the programme's input buffer. That's where getchar() reads the characters from to fill buff.
If the input is long enough, buff will be filled with LINE_LENGTH characters from the input buffer and then printed several times (until the entire contents of the input buffer have been consumed).
On Linux (methinks generally on Unix-ish systems, but I'm not sure), you can also send the input to the programme without entering a newline by typing Ctrl+D on a non-empty line (as the first input on a line, that closes stdin; at a later point in an input line, you can close stdin by typing it twice), so if you type Ctrl+D after entering LINE_LENGTH [or more] characters without a newline, [at least the initial part of] the input is printed immediately then.

I'm very glad to see you here because I also come from China. Actually, my English is poor. And, I am a beginner in programming, too. So, I have not understand your purpose very well.
However, I found a problem in your codes. The loop may not stop. EOF is end of file, but you didn't open any files.
What I knew about getchar() is getting a standard input into butter cache. getchar() is often used to stop the input(or loop).
I hope my answer could help you.

Why does an output string show data I never wrote?

I'm writing a program that encrypts a file by adding 10 to each character. Somehow a portion of the programs working directory is being printed to the file, and I have no idea why.
#include <stdio.h>
int main(void){
FILE *fp;
fp=fopen("tester.csv","r+");
Encrypt(fp);
fclose(fp);
}
int Encrypt(FILE *fp){
int offset=10;
Shift(fp, offset);
}
int Decrypt(FILE *fp){
int offset= -10;
Shift(fp, offset);
}
int Shift(FILE *fp, int offset){
char line[50],tmp[50], character;
long position;
int i;
position = ftell(fp);
while(fgets(line,50,fp) != NULL){
for(i=0;i<50;i++){
character = line[i];
character = (offset+character)%256;
tmp[i] = character;
if(character=='\n' || character == 0){break;}
}
fseek(fp,position,SEEK_SET);
fputs(tmp,fp);
position = ftell(fp);
fseek(stdin,0,SEEK_END);
}
}
the file originally reads
this, is, a, test
i, hope, it, works!
after the program is run:
~rs}6*s}6*k6*~o}~
/alexio/D~6*y|u}+
k6*~o}~
/alexio/D
where users/alexio/Desktop is part of the path. How does this happen???

Because you "encode" the string, it won't be null terminated (that's your case), or it will contain a null even before the end of the string (character+offset % 256 == 0). Later you try to write it as a string, which overruns your buffer, and outputs part of your program arguments.
Use fread and fwrite.

The line
fputs(tmp,fp);
writes out a probably non-null terminated string. So it continues to copy memory to the file until it finds a null.
You need to add a null to the end of 'tmp' in the case where the loop breaks on a newline.

A number of things:
You're encoding all 50 chars from your read buffer, regardless of how many were actually read with fgets(). Recall that fgets() reads a line, not an entire buffer (unless the line is longer than a buffer, and your's is not). Anything past the string length from your line file input is stack garbage.
You're then dumping all that extra garbage data, andbeyond, by not terminating your tmp[] string before writing with fputs() which you should not be using anyway. Yet-more stack garbage.
Solution. Use fread() and fwrite() for this encoding. There is no reason to be using string functions whatsoever. When you write your decoder you'll thank yourself for using fread() and fwrite()

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Unistd read() maximum size - c

Related

Does fgets() hold somehow where it stopped reading from a FILE *?

The strange behavior of 'read' system function

Extra EOF character

getchar() function in C: why it won't print a character right after I use getchar?

Why does an output string show data I never wrote?

Categories

Resources