Writing to file writes garbage on top of what I want - c

I have the code below:
#include <stdio.h>
#include <unistd.h>
int main () {
int fd = open("filename.dat", O_CREAT|O_WRONLY|O_TRUNC, 0600);
int result = write(fd, "abcdefghijklmnopqrstuvxz", 100);
printf("\n\nfd = %d, result = %d, errno = %d", fd, result, errno);
close(fd);
return 0;
}
I am trying to understand what happens when I try to write more bytes to a file than I have available. So I am calling write and asking the program to write 100 bytes while I have much less than that. The result: a bunch of stuff from stdout ends up on filename.dat. If instead of 100 I use strlen("abcdefghijklmnopqrstuvxz"), I get the desired result. My question then is: why is the program trying to write beyond the '\0' character on my string? Is there some undefined behavior going on here?

My question then is: why is the program trying to write beyond the
'\0' character on my string?
The function write(2) doesn't care about 0-terminators. It actually doesn't care about buffer contents at all: it will try to write as many bytes as you tell it.
Is there some undefined behavior going on here
Of course, trying to write more than you have might incur the wrath of the OS who could decide to terminate your process if it touches inaccessible memory.

The write() function you are using does not care about the content. It just writes the no. of bytes you tell it to write in the file.
So when you say it to write 100 bytes and provide less than 100 bytes. The remaining bytes are taken as garbage value.
But when you are using strlen("abcdefghijklmnopqrstuvxz"), you are asking the write() to write bytes equal to the length of the string. So it works fine there

Because there are two techniques to represent a string. There is the null-terminated version, and there is another when you define its size and the pointer to the first byte. Write uses the second one. It needs a pointer where your data begins and a length to know how much data should copy to the file, but it doesn't see the null values. Sometimes these methods wraps a simple memcpy.
So when you defined the 100 length, in the memory after your abcdefghijklmnopqrstuvxz the program stored your "bunch of stdout stuff". That's why you see garbage. You were lucky because you can get SEGFAULT easily in these cases!

My question then is: why is the program trying to write beyond a \0 Because you want it to write 100 chars.
Is there some undefined behavior going on here? If you increase that 100 to a large number and if that area is on a non-privilage area, it is undefined behaviour.

I think that the basic issue here is that you're thinking of C strings as values, you think you're passing this value to the write function, and the write function is writing out your value plus extra junk.
C is lower level than that. In C, we don't really pass strings around, instead we pass pointers to strings, which are 'char *' values but with the added promise that they point to a valid block of memory that should be treated as a null-terminated string.
The write() function doesn't care about the null-terminated string convention. The parameters in the write call provide a file descriptor, a char *, and a buffer length.
Also, the compiler converts string constants into const char arrays. The equivalent of this happens at the top level:
const char *stringconst00001[27] = { 'a', 'b', 'c', ... 'y', 'z', '\0' }
And it does this in main():
int result = write(fd, stringconst00001, 100);

Related

How to sprintf to write a string without warnings about restrictions?

I have been using this code:
char options_string[96];
sprintf(options_string,"%s_G%u", options_string, options.allowed_nucleotide_gap_between_CpN);
which is just writing unsigned integers to a string mixed with some letters.
but with the new version 9 of GCC that I just started using, is warning me:
warning: passing argument 1 to restrict-qualified parameter aliases
with argument 3 [-Wrestrict] 1012 |
sprintf(options_string,"%s_G%u", options_string,
options.allowed_nucleotide_gap_between_CpN);
| ^~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
I've read that the best way to make a string like this is to use sprintf, as I have: How to convert an int to string in C?
I've re-checked the code, and I'm not using any restrict keywords.
How can I write to this string without the warning?
The code causes undefined behaviour because the same part of a char buffer is used as both input and output for sprintf. The warning is useful information in this case. To be correct you must change the code so there is no overlap between inputs and outputs.
For example you could find the end of the current string and start writing from there. Also it would be wise to guard against buffer overflows in the length of output.
Possible code:
char options_string[96];
// ...assume you omitted some code that writes some valid string
size_t upto = strlen(options_string);
int written = snprintf(options_string + upto, sizeof options_string - upto,
"_G%u", options.allowed_nucleotide_gap_between_CpN);
if ( written < 0 || written + upto >= sizeof options_string )
{
// ...what you want to do if the options don't fit in the buffer
}
A conforming implementation of sprintf could start by writing a zero byte to the destination, then replacing that byte with the first byte of output (if any) and writing a zero after that, then replacing that second zero byte with the next byte of output and writing a third zero, etc. Such an approach would avoid the need to have it take any particular action (such as writing a terminating zero) after processing the last byte. An attempt to use your code with such an implementation, however, would fail since options_string would effectively get cleared before code could read it.
The warning you're receiving is thus an indication that your code may not work as written.
In your case it is better to use the strcat function instead of sprintf, due to the fact that you want to concatenate a string.

C UNIX - read() reads none existing letters

I've got a little problem while experimenting with some C code. I've tried to use read()-Command to read a text out of a file and store the results in a charArray. But when I print the results they're always different from the file.
Here is the code:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
void main() {
int fd = open("file", 2);
char buf[2];
printf("Read elements: %ld\n", read(fd, buf, 2));
printf("%s\n", buf);
close(fd);
}
The file "file" was created in the same directory using the following UNIX commands:
cat > file
Hi
So it contains just the word "Hi". When I run it, I expect it to read 2 bytes from the file (which are 'H' and 'i') and store them at buf[0] and buf[1]. But when I want to print the result, it appears, that there was an issue, because besides the word "Hi" there are several wierd characters printed (indicating a memory reading/writing problem i guess, due to bad buffer size). I've tried to increase the size of the buf-Array and it appears that when i change the size, the wierd characters printed change. The problem is removed when size reaches 32 bytes.
Can someone explain to me in detail why this is happening?
I've understood so far that read() does not read \'0' when it reads something, and that the third parameter of read() indicates the maximum number of bytes to read.
Antoher thing I've noticed while experimenting with the above code is the following: Let's assume one changes the third parameter (maximum bytes to read) of read() to 3, and the size of buf-Array to 512 (overkill i know, but I really wanted to see what will happen). Now read will acutally read a third character (in my case 'e') and store it into the buffer, even tho this third character does not exist.
I've searched for a while now #stackoverflow and I found many similiar cases, but none of them made me understand my problem. If there is any other thread i missed, it would be a pleasure if u could link me to it.
At last: sry for my bad english, it's not my native language.
Clearly you need to make buf 3 bytes long and use the last byte as the null byte (0 or '\0'). That way, when you print the string, your computer doesn't carry on until he finds another 0 !
The way strings (char arrays really) are handled in C is quite straightforward. Indeed, when dealing with strings (most) if not all functions take under the assumption that string parameters are null terminated (puts) or return null terminated strings (strdup).
The point is that, by default the computer can't tell where a string ends unless it is given the strings size each time he processes it. The easiest implementation around this approach was to append after each string a 0 (namely the null byte). That way, the computer just need to iterate over the string's characters and stop when he finds the termination character (other name for null byte).

Pthreads, fread(), and printf(): Getting random D4's in my string

The Scoop:
I am creating a method that runs through a lengthy file in chunks: using pthreads. I am calling fread() to read the file in this sort of fashion:
fread( thread_data[i].buffer, 1, 50, f )
/*
thread_data is a data structure for each thread (hence i)
buffer is in thread_data as an array of length 50
*/
I am then directly calling a print statement to see what each thread is doing, as a weird pattern was showing up in some of the parts that I was printing. Namely, my print statement would look something like this:
this is suppose to be 50 characters, but it is only a fewgD4
That D4 directly above is what I have my question on. Every thread that I make, at the end of the string, we are printing D4, and in this case, followed by a g. Other times, it is followed by a d, and most commonly a �. Now, I did read the wikipedia page on this character, which states:
replacement character used to replace an unknown or unrepresentable character
My question:
What kind of an error am I running into? Why is the end of each read statement containing unknown characters, especially the weird gD4 guy?
Aside:
I am trying to make a function in c that utilizes pthreads to find the frequency of each word in a file, in case anyone was wondering. These weird characters were showing up in my list, which is something that I find slightly unpleasent. Finally, don't bother linking me to the Obligaroty Unicode article, I am already aware of it, and the characters are not outside of what I am working with.
The strings you are printing out are not null-terminated — fread() does not null-terminate its output, it simply reads in as many raw bytes as you asked for (or fewer). So when you print out your buffer, your print function is walking past the end of the data and printing out whatever garbage memory comes after the buffer, which in your case just happens to be gD4.
You need to either explicitly null-terminate your buffer; or, if your print function supports it, tell it exactly how many characters to print. Either way, you need to save the return value from fread to know how many characters you read. For example:
int n = fread(thread_data[i].buffer, 1, 50, f);
if (n < 0) /* Handle error */ ;
// Explicitly add a null terminator -- make sure the buffer has room for it!
thread_data[i].buffer[n] = 0;

What does write() write if null terminator is already reached?

For write(fd[1], string, size) - what would happen if string is shorter than size?
I looked up the man page but it doesn't clearly specify that situation. I know that for read, it would simply stop there and read whatever string is, but it's certainly not the case for write. So what is write doing? The return value is still size so is it appending null terminator? Why doesn't it just stop like read.
When you call write(), the system assumes you are writing generic data to some file - it doesn't care that you have a string. A null-terminated string is seen as a bunch of non-zero bytes followed by a zero byte - the system will keep writing out until it's written size bytes.
Thus, specifying size which is longer than your string could be dangerous. It's likely that the system is reading data beyond the end of the string out your file, probably filled with garbage data.
write will write size bytes of data starting at string. If you define string to be an array shorter than size it will have undefined behaviour. But in you previous question the char *line = "apple"; contains 6 characters (i.e. a, p, p, l, e and the null character).
So it is best to write the with the value of size set to the correct value
write(int fildes, const void *buf, size_t nbyte) does not write null terminated strings. It writes the content of a buffer. If there are any null characters in the buffer they will be written as well.
read(int fildes, void *buf, size_t nbyte) also pays no attention to null characters. It reads a number of bytes into the given buffer, up to a maximum of nbyte. It does not add any null terminating bytes.
These are low level routines, designed for reading and writing arbitrary data.
The write call outputs a buffer of the given size. It does not attempt to interpret the data in the buffer. That is, you give it a pointer to a memory location and a number of bytes to write (the length) then, as long as those memory locations exist in a legal portion of your program's data, it will copy those bytes to the output file descriptor.
Unlike the string manipulation routines write, and read for that matter, ignore null bytes, that is bytes with the value zero. read does pay attention to the EOF character and, on certain devices, will only read that amount of data available at the time, perhaps returning less data than requested, but they operate on raw bytes without interpreting them as "strings".
If you attempt to write more data than the buffer contains, it may or may not work depending on the position of the memory. At best the behavior is undefined. At worst you'll get a segment fault and your program will crash.

Why output length is coming 6?

I have written a simple program to calculate length of string in this way.
I know that there are other ways too. But I just want to know why this program is giving this output.
#include <stdio.h>
int main()
{
char str[1];
printf( "%d", printf("%s", gets(str)));
return 0;
}
OUTPUT :
(null)6
Unless you always pass empty strings from the standard input, you are invoking undefined behavior, so the output could be pretty much anything, and it could crash as well. str cannot be a well-formed C string of more than zero characters.
char str[1] allocates storage room for one single character, but that character needs to be the NUL character to satisfy C string constraints. You need to create a character array large enough to hold the string that you're writing with gets.
"(null)6" as the output could mean that gets returned NULL because it failed for some reason or that the stack was corrupted in such a way that the return value was overwritten with zeroes (per the undefined behavior explanation). 6 following "(null)" is expected, as the return value of printf is the number of characters that were printed, and "(null)" is six characters long.
There's several issues with your program.
First off, you're defining a char buffer way too short, a 1 char buffer for a string can only hold one string, the empty one. This is because you need a null at the end of the string to terminate it.
Next, you're using the gets function which is very unsafe, (as your compiler almost certainly warned you about), as it just blindly takes input and copies it into a buffer. As your buffer is 0+terminator characters long, you're going to be automatically overwriting the end of your string into other areas of memory which could and probably does contain important information, such as your rsp (your return pointer). This is the classic method of smashing the stack.
Third, you're passing the output of a printf function to another printf. printf isn't designed for formating strings and returning strings, there are other functions for that. Generally the one you will want to use is sprintf and pass it in a string.
Please read the documentation on this sort of thing, and if you're unsure about any specific thing read up on it before just trying to program it in. You seem confused on the basic usage of many important C functions.
It invokes undefined behavior. In this case you may get any thing. At least str should be of 2 bytes if you are not passing a empty string.
When you declare a variable some space is reserved to store the value.
The reserved space can be a space that was previously used by some other
code and has values. When the variable goes out of scope or is freed
the value is not erased (or it may be, anything goes.) only the programs access
to that variable is revoked.
When you read from an unitialised location you can get anything.
This is undefined behaviour and you are doing that,
Output on gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 is 0
For above program your input is "(null)", So you are getting "(null)6". Here "6" is the output from printf (number of characters successfully printed).

Resources