What does write() write if null terminator is already reached? - c

For write(fd[1], string, size) - what would happen if string is shorter than size?
I looked up the man page but it doesn't clearly specify that situation. I know that for read, it would simply stop there and read whatever string is, but it's certainly not the case for write. So what is write doing? The return value is still size so is it appending null terminator? Why doesn't it just stop like read.

When you call write(), the system assumes you are writing generic data to some file - it doesn't care that you have a string. A null-terminated string is seen as a bunch of non-zero bytes followed by a zero byte - the system will keep writing out until it's written size bytes.
Thus, specifying size which is longer than your string could be dangerous. It's likely that the system is reading data beyond the end of the string out your file, probably filled with garbage data.

write will write size bytes of data starting at string. If you define string to be an array shorter than size it will have undefined behaviour. But in you previous question the char *line = "apple"; contains 6 characters (i.e. a, p, p, l, e and the null character).
So it is best to write the with the value of size set to the correct value

write(int fildes, const void *buf, size_t nbyte) does not write null terminated strings. It writes the content of a buffer. If there are any null characters in the buffer they will be written as well.
read(int fildes, void *buf, size_t nbyte) also pays no attention to null characters. It reads a number of bytes into the given buffer, up to a maximum of nbyte. It does not add any null terminating bytes.
These are low level routines, designed for reading and writing arbitrary data.

The write call outputs a buffer of the given size. It does not attempt to interpret the data in the buffer. That is, you give it a pointer to a memory location and a number of bytes to write (the length) then, as long as those memory locations exist in a legal portion of your program's data, it will copy those bytes to the output file descriptor.
Unlike the string manipulation routines write, and read for that matter, ignore null bytes, that is bytes with the value zero. read does pay attention to the EOF character and, on certain devices, will only read that amount of data available at the time, perhaps returning less data than requested, but they operate on raw bytes without interpreting them as "strings".
If you attempt to write more data than the buffer contains, it may or may not work depending on the position of the memory. At best the behavior is undefined. At worst you'll get a segment fault and your program will crash.

Related

Can `snprintf()` read out of bounds when a string argument is not null-terminated?

I have the following piece of code, which a colleague claims may contain an out-of-bounds read, which I do not agree with. Could you help settle this argument and explain why?
char *test_filename = malloc(Size + 1);
sprintf(test_filename, "");
if (Size > 0 && Data)
snprintf(test_filename, Size + 1, "%s", Data);
where Data is a non-null-terminated string of type const uint8_t *Data and Size is the size of Data, i.e., number of bytes in Data, of type size_t.
It may read out-of-bounds because the format string is %s, perhaps?
Your colleague is correct. Perhaps unintuitively, snprintf(test_filename, Size + 1, "%s", Data) is guaranteed to read bytes starting at Data until a 0 byte is encountered, in your case typically resulting in an out-of-bounds read.
It will only write Size of these bytes to test_filename and null terminate them, respecting the size limit of the destination; but it will continue to read on. The reason for that is a design choice which enables the caller to determine the needed destination size for dynamic allocation before anything is actually written: snprintf() returns the number of bytes which would be written if the destination had infinite space. This feature is supposed to be used with a destination size of 0 (and potentially a null pointer as the destination). This functionality is useful for arguments which are not strings: With numbers etc. the size of the output is difficult to predict (e.g. locale dependent) and best left to the function at run time.
At the same time the return value indicates whether the output was truncated: If it is greater or equal to the size parameter, not all of the input was used in the output. In your case, what was left out were the bytes starting a Data[Size] and ending with the first 0 byte, or a segmentation fault ;-).
Suggestion for a fix: First of all it is unclear why you would use the printf family to print a string; simply copy it. And then Andrew has a point in his comments that since Datais not null terminated it is not really a string (even if all bytes are printable); so don't start fiddling with strcpy and friends but simply memcpy() the bytes, and null terminate manually.
Oh, and the preceding sprintf(test_filename, ""); does not serve any discernible purpose. If you want to write a null byte to *Data, simply do so; but since you are not using strcat, which would rely on a terminated destination string to extend, it is quite unnecessary.
from the MAN page for snprintf()
The functions snprintf() and vsnprintf() write at most size bytes (including the terminating null byte ('\0')) to str.
Note the at most size bytes
This means that snprintf() will stop transferring bytes after the parameter Size bytes are transferred.
this statement;
sprintf(test_filename, "");
is completely unneeded and has no effect on the operation of the second call to snprintf()
If you want to result in a 'proper' string, suggest:
char *test_filename = calloc( sizeof( char ), Size + 1);
if (Size > 0 && Data)
snprintf(test_filename, Size, "%s", Data);
however, the function: snprintf() keeps reading until a NUL byte is encountered. This can create problems, upto and including a seg fault event.
The function: memcpy() is made for this kind of job. Suggest replacing the call to snprintf() with
memcpy( test_filename, Data, Size );

Will assigning a large value for length of char string be an issue?

I am reading a line from a file and I do not know the length it is going to be. I know there are ways to do this with pointers but I am specifically asking for just a plan char string. For Example if I initialize the string like this:
char string[300]; //(or bigger)
Will having large string values like this be a problem?
Any hard coded number is potentially too small to read the contents of a file. It's best to compute the size at run time, allocate memory for the contents, and then read the contents.
See Read file contents with unknown size.
char string[300]; //(or bigger)
I am not sure which of the two issues you are concerned with, so I will try to address both below:
if the string in the file is larger than 300 bytes and you try to "stick" that string in that buffer, without accounting the max length of your array -you will get undefined behaviour because of overwriting the array.
If you are just asking if 300 bytes is too much too allocate - then no, it is not a big deal unless you are on some very restricted device. e.g. In Visual Studio the default stack size (where that array would be stored) is 1 MB if I am not wrong. Benefits of doing so is understandable, e.g. you don't need to concern yourself with freeing it etc.
PS. So if you are sure the buffer size you specify is enough - this can be fine approach as you free yourself from memory management related issues - which you get from pointers and dynamic memory.
Will having large string values like this be a problem?
Absolutely.
If your application must read the entire line from a file before processing it, then you have two options.
1) Allocate buffer large enough to hold the line of maximum allowed length. For example, the SMTP protocol does not allow lines longer than 998 characters. In that case you can allocate a static buffer of length 1001 (998 + \r + \n + \0). Once you have read a line from a file (or from a client, in the example context) which is longer than the maximum length (that is, you have read 1000 characters and the last one is not \n), you can treat it as a fatal (protocol) error and report it.
2) If there are no limitations on the length of the input line, the only thing you can do to ensure your program robustness is allocating buffers dynamically as the input is read. This may involve storing multiple malloc-ed buffers in a linked list, or calling realloc when buffer exhaustion detected (this is how getline function works, although it is not specified in the C standard, only in POSIX.1-2008).
In either case, never use gets to read the line. Call fgets instead.
It all depends on how you read the line. For example:
char string[300];
FILE* fp = fopen(filename, "r");
//Error checking omitted
fgets(string, 300, fp);
Taken from tutorialspoint.com
The C library function char *fgets(char *str, int n, FILE *stream) reads a line from the specified stream and stores it into the string pointed to by str. It stops when either (n-1) characters are read, the newline character is read, or the end-of-file is reached, whichever comes first.
That means that this will read 299 characters from the file at most. This will cause only a logical error (because you might not get all the data you need) that won't cause any undefined behavior.
But, if you do:
char string[300];
int i = 0;
FILE* fp = fopen(filename, "r");
do{
string[i] = fgetc(fp);
i++;
while(string[i] != '\n');
This will cause Segmantation Fault because it will try to write on unallocated memory on lines bigger than 300 characters.

Does the C standard guarantee buffers are not touched past their null terminator?

In the various cases that a buffer is provided to the standard library's many string functions, is it guaranteed that the buffer will not be modified beyond the null terminator? For example:
char buffer[17] = "abcdefghijklmnop";
sscanf("123", "%16s", buffer);
Is buffer now required to equal "123\0efghijklmnop"?
Another example:
char buffer[10];
fgets(buffer, 10, fp);
If the read line is only 3 characters long, can one be certain that the 6th character is the same as before fgets was called?
The C99 draft standard does not explicitly state what should happen in those cases, but by considering multiple variations, you can show that it must work a certain way so that it meets the specification in all cases.
The standard says:
%s - Matches a sequence of non-white-space characters.252)
If no l length modifier is present, the corresponding argument shall be a
pointer to the initial element of a character array large enough to accept the
sequence and a terminating null character, which will be added automatically.
Here's a pair of examples that show it must work the way you are proposing to meet the standard.
Example A:
char buffer[4] = "abcd";
char buffer2[10]; // Note the this could be placed at what would be buffer+4
sscanf("123 4", "%s %s", buffer, buffer2);
// Result is buffer = "123\0"
// buffer2 = "4\0"
Example B:
char buffer[17] = "abcdefghijklmnop";
char* buffer2 = &buffer[4];
sscanf("123 4", "%s %s", buffer, buffer2);
// Result is buffer = "123\04\0"
Note that the interface of sscanf doesn't provide enough information to really know that these were different. So, if Example B is to work properly, it must not mess with the bytes after the null character in Example A. This is because it must work in both cases according to this bit of spec.
So implicitly it must work as you stated due to the spec.
Similar arguments can be placed for other functions, but I think you can see the idea from this example.
NOTE:
Providing size limits in the format, such as "%16s", could change the behavior. By the specification, it would be functionally acceptable for sscanf to zero out a buffer to its limits before writing the data into the buffer. In practice, most implementations opt for performance, which means they leave the remainder alone.
When the intent of the specification is to do this sort of zeroing out, it is usually explicitly specified. strncpy is an example. If the length of the string is less than the maximum buffer length specified, it will fill the rest of the space with null characters. The fact that this same "string" function could return a non-terminated string as well makes this one of the most common functions for people to roll their own version.
As far as fgets, a similar situation could arise. The only gotcha is that the specification explicitly states that if nothing is read in, the buffer remains untouched. An acceptable functional implementation could sidestep this by checking to see if there is at least one byte to read before zeroing out the buffer.
Each individual byte in the buffer is an object. Unless some part of the function description of sscanf or fgets mentions modifying those bytes, or even implies their values may change e.g. by stating their values become unspecified, then the general rule applies: (emphasis mine)
6.2.4 Storage durations of objects
2 [...] An object exists, has a constant address, and retains its last-stored value throughout its lifetime. [...]
It's this same principle that guarantees that
#include <stdio.h>
int a = 1;
int main() {
printf ("%d\n", a);
printf ("%d\n", a);
}
attempts to print 1 twice. Even though a is global, printf can access global variables, and the description of printf doesn't mention not modifying a.
Neither the description of fgets nor that of sscanf mentions modifying buffers past the bytes that actually were supposed to be written (except in the case of a read error), so those bytes don't get modified.
The standard is somewhat ambiguous on this, but I think a reasonable reading of it is that the answer is: yes, it's not allowed to write more bytes to the buffer than it read+null. On the other hand, a stricter reading/interpretation of the text could conclude that the answer is no, there's no guarantee. Here's what a publicly avaialble draft says about fgets.
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
There's a guarantee about how much it is supposed to read from the input, i.e. stop reading at newline or EOF and not read more than n-1 bytes. Although nothing is said explicitly about how much it's allowed to write to the buffer, the common knowledge is that fgets's n parameter is used to prevent buffer overflows. It's a little strange that the standard uses the ambiguous term read, which may not necessarily imply that gets can't write to the buffer more than n bytes, if you want to nitpick on the terminology it uses. But note that the same "read" terminology is used about both issues: the n-limit and the EOF/newline limit. So if you interpret the n-related "read" as a buffer-write limit, then [for consistency] you can/should interpret the other "read" the same way, i.e. not write more than what it read when string is shorter than the buffer.
On the other hand, if you distinguish between the uses of the phrase-verb "read into" (="write") and just "read", then you can't read the committee's text the same way. You are guaranteed that it won't "read into" (="write to") the array more than n bytes, but if the input string is terminated sooner by newline or EOF you're only guaranteed the rest (of the input) won't be "read", but whether that implies in won't be "read into" (="written to") the buffer is unclear under this stricter reading. The crucial issue is keyword is "into", which is elided, so the problem is whether the completion given by me in brackets in the following modified quote is the intended interpretation:
No additional characters are read [into the array] after a new-line character (which is retained) or after end-of-file.
Frankly a single postcondition stated as a formula (and would be pretty short in this case) would have been a lot more helpful than the verbiage I quoted...
I can't be bothered to try and analyze their writeup about the *scanf family, because I suspect it's going to be even more complicated given all the other things that happen in those functions; their writeup for fscanf is about five pages long... But I suspect a similar logic applies.
is it guaranteed that the buffer will not be modified beyond the null
terminator?
No, there's no guarantee.
Is buffer now required to equal "123\0efghijklmnop"?
Yes. But that's only because you've used correct parameters to your string related functions. Should you mess up buffer length, input modifiers to sscanf and such, then you program will compile. But it will most likely fail during runtime.
If the read line is only 3 characters long, can one be certain that the 6th character is the same as before fgets was called?
Yes. Once fgets() figures you have a 3 character input string it stores the input in the provided buffer, and it doesn't care about the reset of provided space at all.
Is buffer now required to equal "123\0efghijklmnop"?
Here buffer is just consists of 123 string guaranteed terminating at NUL.
Yes the memory allocated for array buffer will not get de-allocated, however you are making sure/restricting your string buffer can atmost only have 16 char elements which you can read into it at any point of time. Now depends whether you write just a single char or maximum what buffer can take.
For example:
char buffer[4096] = "abc";`
actually does something below,
memcpy(buffer, "abc", sizeof("abc"));
memset(&buffer[sizeof("abc")], 0, sizeof(buffer)-sizeof("abc"));
The standard insists that if any part of char array is initialized that is all it consists of at any moment until obeying its memory boundary.
There are no any guarantees from standard, which is why the functions sscanf and fgets are recommended to be used (with respect to the size of the buffer) as you show in your question (and using of fgets is considered preferable compared with gets).
However, some standard functions use null-terminator in their work, e.g. strlen (but I suppose you ask about string modification)
EDIT:
In your example
fgets(buffer, 10, fp);
untouching characters after 10-th is guaranteed (content and length of buffer will not be considered by fgets)
EDIT2:
Moreover, when using fgets keep in mind that '\n' will be stored in the buffers. e.g.
"123\n\0fghijklmnop"
instead of expected
"123\0efghijklmnop"
Depends on the function in use (and to a lesser degree its implementation). sscanf will start writing when it encounters its first non-whitespace character, and continue writing until its first whitespace character, where it will add a finishing 0 and return. But a function like strncpy (famously) zeroes out the rest of the buffer.
There is however nothing in the C standard which mandates how these functions behave.

How to save a specific length string from a file and work with it in C

So what I'm trying to do is open a file and read it until the end in blocks that are 256 bytes long each time it is called. My dilemma is using fgets() or fread() to do it.
I was using fgets() initially, because it returns a string of the bytes that were read, which is great because I can store that data and work with it. However, in my particular file that I'm reading, the 256 bytes often happen over a more than 2 lines, which is a problem because fgets() stops reading when it hits a newline character or the end of the file.
I then thought of using fread(), but I don't know how to save the line that I'm referring to with it because fread() returns an int referring to the number of elements successfully read (according to its documentation).
I've searched and thought of solutions for a while now and can't find anything that works with my particular scenario. I would like some guidance on how to go about this issue, how would you go about this in my position?
You can use fread() to read each 256 bytes block and keep a lineCount variable to keep track of the number of new line characters you have encountered so far in the input. Since you have to process the blocks already this wouldn't mean much of an overhead in the processing.
To read a block of 256 chars, which is what I think you are doing, you just need to create a buffer of chars that can hold 256 of them, in other words a char array of size 256.
#define BLOCK_SIZE 256
char block[BLOCK_SIZE];
Then if you check the documentation for fread() it shows the following signature:
Following is the declaration for fread() function.
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
Parameters
ptr -- This is the pointer to a block of memory with a minimum size of size*nmemb bytes.
size -- This is the size in bytes of each element to be read.
nmemb -- This is the number of elements, each one with a size of size bytes.
stream -- This is the pointer to a FILE object that an input stream.
So this means it takes a pointer to the buffer where it will write the read information, the size of each element it's supposed to read, the maximum amount of elements you want it to read and the file pointer. In your case it would be:
int read = fread(block, sizeof(char), BLOCK_SIZE, file);
This will copy the information from the file to the block array, which you can later process and keep track of the lines. The characters that were read by fread are in the block array, so the first char in the last read block would be block[0], the second block[1] and so on. The returned value in read indicates how many elements (in your case chars) were inserted in the array block when you call fread, this number will be equal to BLOCK_SIZE for every call, unless you reach the end of the file or there's an error.
I suggest you read some documentation for a full example, play a little with the code and do some reading on pointers in C to gain a better understanding of how everything works in general. If you still have questions after that, we can take it from there or you can create a new SO question.

fread() size argument

I want to read some data from the file, the data will have different sizes at different times.
If I use the below code, then:
char dataStr[256];
fread(dataStr, strlen(dataStr), 1, dFd);
fread is returning 0 for the above call and not reading any thing from the file.
But, if I give size as 1 then it successfully reads one char from the file.
What should be the value of size argument to the fread() function when we do not know how much is the size of the data in the file?
strlen counts the number of characters until it hits \0.
In this case you probably hit \0 on the very first character hence strlen returns 0 as the length and nothing is read.
You sould use sizeof instead of strlen.
You can't do that, obviously.
You can read until a known delimiter, often line feed, using fgets() to read a line. Or you can read a known-in-advance byte count, using that argument.
Of course, if there's an upper bound on the amount of data, you can read that always, and then somehow inspect the data to see what you got.
Also, in your example you're using strlen() on the argument that is going to be overwritten, that implies that it already contains a proper string of the exact same size as the data that is going to be read. This seems unlikely, you probably mean sizeof dataStr there.
You should use:
fread(dataStr, 1, sizeof dataStr, dFd);
to indicate that you want to read the number of bytes equal to the size of your array buffer.
The reason why your code doesn't work is that strlen() finds the length of a NULL-terminated string, not the size of the buffer. In your case, you run it on an uninitialized buffer and simply get lucky, your first byte in the buffer is NULL, so strlen(dataStr) returns 0, but is just as likely to crash or return some random number greater than your buffer size.
Also note that fread() returns the number of items read, not the number of characters (I swapped the second and the third arguments so that each character is equivalent to one item).
fread returns the number of successfully readed numblocks.
You can:
if( 1==fread(dataStr, 256, 1, dFd) )
puts("OK");
It reads ever the full length of your defined data; fread can't break on '\0'.

Resources