Terminate string full of garbage? - c

Does C allow to place a string terminator at the end of read bytes full of garbage or is it only guaranteed if the read bytes are chars ?
I need to read something like this from stdin but I do not know how many chars to read and EOF is not guaranteed:
Hello World!---full of garbage until 100th byte---
char *var = malloc(100 + 1);
read(0, var, 100); // read from stdin. Unfortunately, I do not know how many bytes to read and stdin is not guaranteed to hold an EOF. (I chose 100 as an educated guess.)
var[100] = '\0'; // Is it possible to place a terminator at the end if most of the read bytes are garbage ?

read() returns the number of characters that were actually read into the buffer (or <0 in the case of an error). Hence the following should work:
int n;
char *var = malloc(100 + 1);
n = read(0, var, 100);
if(n >= 0)
var[n] = '\0';
else
/* error */

It is possible to place a terminator at the end, but the end result might be Hello World! and a long string of garbage after that.
Bytes are always chars. If you wanted to accept only printable characters (which the garbage at the end might contain, anyway) you could read the input one character at a time and check if each byte's value is between 0x20 and 0x7E.
Although that's only guaranteed to work with ASCII strings...

Related

File i/o parsing in c giving garbage characters at end of string [duplicate]

I have a char array buffer that I am using to store characters that the user will input one by one. My code below works but has a few glitches that I can't figure out:
when I execute a printf to see what's in Buffer, it does fill up but I get garbage characters at the end
it won't stop at 8 characters despite being declared as char Buffer[8];
Can somebody please explain to me what is going on and perhaps how I could fix this? Thanks.
char Buffer[8]; //holds the byte stream
int i=0;
if (/* user input event has occurred */)
{
Buffer[i] = charInput;
i++;
// Display a response to input
printf("Buffer is %s!\n", Buffer);
}
Output:
tagBuffer is 1┬┬w!
tagBuffer is 12┬w!
tagBuffer is 123w!
tagBuffer is 1234!
tagBuffer is 12345!
tagBuffer is 123456=!
tagBuffer is 1234567!
tagBuffer is 12345678!
tagBuffer is 123456789!
You have to end the string with a \0 character. That's why they are called zero terminated strings.
It is also wise to allocate 1 extra char to hold the \0.
The only thing you are passing to the printf() function is a pointer to the first character of your string. printf() has no way of knowing the size of your array. (It doesn't even know if it's an actual array, since a pointer is just a memory address.)
printf() and all the standard c string functions assume that there is a 0 at the end of your string. printf() for example will keep printing characters in memory, starting at the char that you pass to the function, until it hits a 0.
Therefore you should change your code to something like this:
char Buffer[9]; //holds the byte stream
int i=0;
if( //user input event has occured )
{
Buffer[i] = charInput;
i++;
Buffer[i] = 0; // You can also assign the char '\0' to it to get the same result.
// Display a response to input
printf("Buffer is %s!\n", Buffer);
}
In addition to the previous comments about zero termination, you also have to accept responsibility for not overflowing your own buffer. It doesn't stop at 8 characters because your code is not stopping! You need something like the following (piggy-backing onto Jeremy's suggestion):
#define DATA_LENGTH 8
#define BUFFER_LENGTH (DATA_LENGTH + 1)
char Buffer[BUFFER_LENGTH]; //holds the byte stream
int charPos=0; //index to next character position to fill
while (charPos <= DATA_LENGTH ) { //user input event has occured
Buffer[i] = charInput;
Buffer[i+1] = '\0';
// Display a response to input
printf("Buffer is %s!\n", Buffer);
i++;
}
In other words, make sure to stop accepting data when the maximum length has been reached, regardless of what the environment tries to push at you.
If you are programming in C or C++, you have to remember that:
1) the strings are finished with a \0 character.
2) C does not have boundary check at strings, they are just character arrays.
It's odd that no-one has mentioned this possibility:
char Buffer[8]; //holds the byte stream
int i = 0;
while (i < sizeof(Buffer) && (charInput = get_the_users_character()) != EOF)
{
Buffer[i] = charInput;
i++;
// Display a response to input
printf("Buffer is %.*s!\n", i, Buffer);
}
This notation in the printf() format string specifies the maximum length of the string to be displayed, and does not require null termination (though null termination is ultimately the best way to go -- at least once you leave this loop).
The while loop is more plausible than a simple if, and this version ensures that you do not overflow the end of the buffer (but does not ensure you leave enough space for a trailing NUL '\0'. If you want to handle that, use sizeof(Buffer) - 1 and then add the NUL after the loop.
Since Buffer is not initialized, it starts with all 9 garbage values.
From the observed output, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th and 2 immediate next memory location(outside the array) elements are clearly 'T', 'T', 'W', '\0', '\0', '=', '\0', '\0', '\0'.
Strings consume all the characters up until they see NULL character. That is why, in every iteration, as the array elements are assigned one by one, buffer is printed up to the part where a garbage NULL is present.
That is to say, string has an undefined behavior if the character array doesn't end with '\0'. You can avoid this by having an extra space for '\0' at the end of the buffer.
You might also want to look into using a stringstream.

Why does this string with a pointer return a random set of symbols? [duplicate]

I have a char array buffer that I am using to store characters that the user will input one by one. My code below works but has a few glitches that I can't figure out:
when I execute a printf to see what's in Buffer, it does fill up but I get garbage characters at the end
it won't stop at 8 characters despite being declared as char Buffer[8];
Can somebody please explain to me what is going on and perhaps how I could fix this? Thanks.
char Buffer[8]; //holds the byte stream
int i=0;
if (/* user input event has occurred */)
{
Buffer[i] = charInput;
i++;
// Display a response to input
printf("Buffer is %s!\n", Buffer);
}
Output:
tagBuffer is 1┬┬w!
tagBuffer is 12┬w!
tagBuffer is 123w!
tagBuffer is 1234!
tagBuffer is 12345!
tagBuffer is 123456=!
tagBuffer is 1234567!
tagBuffer is 12345678!
tagBuffer is 123456789!
You have to end the string with a \0 character. That's why they are called zero terminated strings.
It is also wise to allocate 1 extra char to hold the \0.
The only thing you are passing to the printf() function is a pointer to the first character of your string. printf() has no way of knowing the size of your array. (It doesn't even know if it's an actual array, since a pointer is just a memory address.)
printf() and all the standard c string functions assume that there is a 0 at the end of your string. printf() for example will keep printing characters in memory, starting at the char that you pass to the function, until it hits a 0.
Therefore you should change your code to something like this:
char Buffer[9]; //holds the byte stream
int i=0;
if( //user input event has occured )
{
Buffer[i] = charInput;
i++;
Buffer[i] = 0; // You can also assign the char '\0' to it to get the same result.
// Display a response to input
printf("Buffer is %s!\n", Buffer);
}
In addition to the previous comments about zero termination, you also have to accept responsibility for not overflowing your own buffer. It doesn't stop at 8 characters because your code is not stopping! You need something like the following (piggy-backing onto Jeremy's suggestion):
#define DATA_LENGTH 8
#define BUFFER_LENGTH (DATA_LENGTH + 1)
char Buffer[BUFFER_LENGTH]; //holds the byte stream
int charPos=0; //index to next character position to fill
while (charPos <= DATA_LENGTH ) { //user input event has occured
Buffer[i] = charInput;
Buffer[i+1] = '\0';
// Display a response to input
printf("Buffer is %s!\n", Buffer);
i++;
}
In other words, make sure to stop accepting data when the maximum length has been reached, regardless of what the environment tries to push at you.
If you are programming in C or C++, you have to remember that:
1) the strings are finished with a \0 character.
2) C does not have boundary check at strings, they are just character arrays.
It's odd that no-one has mentioned this possibility:
char Buffer[8]; //holds the byte stream
int i = 0;
while (i < sizeof(Buffer) && (charInput = get_the_users_character()) != EOF)
{
Buffer[i] = charInput;
i++;
// Display a response to input
printf("Buffer is %.*s!\n", i, Buffer);
}
This notation in the printf() format string specifies the maximum length of the string to be displayed, and does not require null termination (though null termination is ultimately the best way to go -- at least once you leave this loop).
The while loop is more plausible than a simple if, and this version ensures that you do not overflow the end of the buffer (but does not ensure you leave enough space for a trailing NUL '\0'. If you want to handle that, use sizeof(Buffer) - 1 and then add the NUL after the loop.
Since Buffer is not initialized, it starts with all 9 garbage values.
From the observed output, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th and 2 immediate next memory location(outside the array) elements are clearly 'T', 'T', 'W', '\0', '\0', '=', '\0', '\0', '\0'.
Strings consume all the characters up until they see NULL character. That is why, in every iteration, as the array elements are assigned one by one, buffer is printed up to the part where a garbage NULL is present.
That is to say, string has an undefined behavior if the character array doesn't end with '\0'. You can avoid this by having an extra space for '\0' at the end of the buffer.
You might also want to look into using a stringstream.

How to properly print file content to the command line in C?

I want to print the contents of a .txt file to the command line like this:
main() {
int fd;
char buffer[1000];
fd = open("testfile.txt", O_RDONLY);
read(fd, buffer, strlen(buffer));
printf("%s\n", buffer);
close(fd);
}
The file testfile.txt looks like this:
line1
line2
line3
line4
The function prints only the first 4 letters line.
When using sizeof instead of strlen the whole file is printed.
Why is strlen not working?
It is incorrect to use strlen at all in this program. Before the call to read, the buffer is uninitialized and applying strlen to it has undefined behavior. After the call to read, some number of bytes of the buffer are initialized, but the buffer is not necessarily a proper C string; strlen(buffer) may return a number having no relationship to the amount of data you should print out, or may still have UB (if read initialized the full length of the array with non-nul bytes, strlen will walk off the end). For the same reason, printf("%s\n", buffer) is wrong.
Your program also can't handle files larger than the buffer at all.
The right way to do this is by using the return value of read, and write, in a loop. To tell read how big the buffer is, you use sizeof. (Note: if you had allocated the buffer with malloc rather than as a local variable, then you could not use sizeof to get its size; you would have to remember the size yourself.)
#include <unistd.h>
#include <stdio.h>
int main(void)
{
char buf[1024];
ssize_t n;
while ((n = read(0, buf, sizeof buf)) > 0)
write(1, buf, n);
if (n < 0) {
perror("read");
return 1;
}
return 0;
}
Exercise: cope with short writes and write errors.
When using sizeof instead of strlen the whole file is printed. Why is
strlen not working?
Because how strlen works is it goes through the char array passed in and counts characters till it encounters 0. In your case, buffer is not initialized - hence it will try to access elements of uninitialized array (buffer) to look for 0, but reading uninitialized memory is not allowed in C. Actually you get undefined behavior.
sizeof works differently and returns the number of bytes of the passed object directly without looking for a 0 inside the array as strlen does.
As correctly noted in other answers read will not null terminate the string for you so you have to do it manually or declare buffer as:
char buffer[1000] = {0};
In this case printing such buffer using %s and printf after reading the file, will work, only assuming read didn't initialize full array with bytes of which none is 0.
Extra:
Null terminating a string means you append a 0 to it somewhere. This is how most of the string related functions guess where the string ends.
Why is strlen not working?
Because when you call it in read(fd, buffer, strlen(buffer));, you haven't yet assigned a valid string to buffer. It contains some indeterminate data which may or may not have a 0-valued element. Based on the behavior you report, buffer just so happens to have a 0 at element 4, but that's not reliable.
The third parameter tells read how many bytes to read from the file descriptor - if you want to read as many bytes as buffer is sized to hold, use sizeof buffer. read will return the number of bytes read from fd (0 for EOF, -1 for an error). IINM, read will not zero-terminate the input, so using strlen on buffer after calling read would still be an error.

Strange behavior of sscanf

I found some strange thing. Here is example of code:
...
char *start = strchr(value, '(');
if(start)
{
char buf[LEN];
memset(buf, 0, LEN);
int num = sscanf(start, "(%s)", buf);
if(num)
{
buf[strlen(buf) - 1] = '\0';
sprintf(value, "%s", buf);
}
...
if value is "(xxx)", for example, then value will be "xxx" after this actions.
But if value is "([34]xx{4,7}| 1234567890)" then value will be "[34]xx{4,7}".
Can anyone explain it?
P.S. it's ARM platform.
int num = sscanf(start, "(%s)", buf);
Here, sscanf returns when it encounters a whitespace in the buffer pointed to by start. You have a space in your input string:
"([34]xx{4,7}| 1234567890)"
^ space here
scanf returns the number of input items successfully matched and assigned. Here, it will return 1 and the value of num is 1. Next, you overwrite the last character in buf by this statement in your if block.
buf[strlen(buf) - 1] = '\0';
That explains your program's output. Now, a few things about your code:
You don't need to do memset(buf, 0, LEN);. Simply do char buf[LEN] = {0}; This fills the array with the null byte.
sscanf doesn't check for the array bound of the buffer buf into which you are writing the string which sscanf is reading from start. If the size of buf is not enough, sscanf will try to write in the memory beyond the buffer buf. This will lead to undefined behaviour and even program crash because of illegal memory access. You should give field width in the format string of sscanf to guard against the buffer overrun.
#define STRINGIFY(s) #s // preprocessor command # stringifies the token s
#define XSTRINGIFY(s) STRINGIFY(s)
#define LEN 10 // max buffer length without the null byte
// inside a function
char buf[LEN + 1]; // +1 for the null byte
const char *format = "(" XSTRINGIFY(LEN) "%s)"; // "(%10s)"
int num = sscanf(start, format, buf);
The 10 in the format string "(%10s)" means that at most 10 characters are stored in the buffer pointed to by buf and then a null byte \0 is added automatically in the end. Hence you don't need the following in the if block:
buf[strlen(buf) - 1] = '\0'; // overwrites the last char before null byte in buf.
Doing this, in fact, overwrites the last character in buf because strlen doesn't count the null byte.
sscanf is used with %s, when it encounters whitespace it will terminate. That is the reason you are getting the output as "[34]xx{4,7}" instead of expected behaviour
The format string consists of a sequence of directives which describe how to process the sequence of input characters. If processing of a directive fails, no further input is read, and scanf() returns. A "failure" can be either of the following: input failure, meaning that input characters were unavailable, or matching failure, meaning that the input was inappropriate (see below).
In your case, sscanf matches the starting (, and then parses the next token, %s which consumes data up to the first whitespace character. sscanf then fails to match a ), which means that the parsing stops. One token was successfully read and assigned, so the return value is 1.
Note that when using scanf, you cannot detect matching failures that occur after the last token that is assigned.

I don't understand the behavior of fgets in this example

While I could use strings, I would like to understand why this small example I'm working on behaves in this way, and how can I fix it ?
int ReadInput() {
char buffer [5];
printf("Number: ");
fgets(buffer,5,stdin);
return atoi(buffer);
}
void RunClient() {
int number;
int i = 5;
while (i != 0) {
number = ReadInput();
printf("Number is: %d\n",number);
i--;
}
}
This should, in theory or at least in my head, let me read 5 numbers from input (albeit overwriting them).
However this is not the case, it reads 0, no matter what.
I understand printf puts a \0 null terminator ... but I still think I should be able to either read the first number, not just have it by default 0. And I don't understand why the rest of the numbers are OK (not all 0).
CLARIFICATION: I can only read 4/5 numbers, first is always 0.
EDIT:
I've tested and it seems that this was causing the problem:
main.cpp
scanf("%s",&cmd);
if (strcmp(cmd, "client") == 0 || strcmp(cmd, "Client") == 0)
RunClient();
somehow.
EDIT:
Here is the code if someone wishes to compile. I still don't know how to fix
http://pastebin.com/8t8j63vj
FINAL EDIT:
Could not get rid of the error. Decided to simply add #ReadInput
int ReadInput(BOOL check) {
...
if (check)
printf ("Number: ");
...
# RunClient()
void RunClient() {
...
ReadInput(FALSE); // a pseudo - buffer flush. Not really but I ignore
while (...) { // line with garbage data
number = ReadInput(TRUE);
...
}
And call it a day.
fgets reads the input as well as the newline character. So when you input a number, it's like: 123\n.
atoi doesn't report errors when the conversion fails.
Remove the newline character from the buffer:
buf[5];
size_t length = strlen(buffer);
buffer[length - 1]=0;
Then use strtol to convert the string into number which provides better error detection when the conversion fails.
char * fgets ( char * str, int num, FILE * stream );
Get string from stream.
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str. (This means that you carry \n)
A terminating null character is automatically appended after the characters copied to str.
Notice that fgets is quite different from gets: not only fgets accepts a stream argument, but also allows to specify the maximum size of str and includes in the string any ending newline character.
PD: Try to have a larger buffer.

Resources