Getting strange characters from strncpy() function - c

I am supposed to load a list of names from a file, and then find those names in the second file and load them in a structure with some other data (for the simplicity, I will load them to another array called "test".
The first part is just fine, I am opening a file and loading all the names into a 2dimensional array called namesArr.
The second part is where unexpected characters occur, and I can't understand why. Here is the code of the function:
void loadStructure(void){
char line[MAX_PL_LENGTH], *found;
int i, j=0;
char test[20][20];
FILE *plotPtr=fopen(PLOT_FILE_PATH, "r");
if (plotPtr==NULL){perror("Error 05:\nError opening a file in loadStructure function. Check the file path"); exit(-5);}
while(fgets(line, MAX_PL_LENGTH, plotPtr)!=NULL){ // This will load each line from a file to an array "line" until it reaches the end of file.
for(i=0; i<numOfNames; i++){ // Looping through the "namesArr" array, which contains the list of 20 character names.
if((found=strstr(line, namesArr[i]))!=NULL){ // I use strstr() to find if any of those names appear in the particular line.
printf("** %s", found); // Used of debugging.
strncpy(test[j], found, strlen(namesArr[i])); j++; // Copying the newly found name to test[j] (copying only the name, by defining it's length, which is calculated by strlen function).
}
}
}
fclose(plotPtr);
printf("%s\n", test[0]);
printf("%s\n", test[1]);
printf("%s\n", test[2]);
}
This is the output I get:
...20 names were loaded from the "../Les-Mis-Names-20.txt".
** Leblanc, casting
** Fabantou seems to me to be better," went on M. Leblanc, casting
** Jondrette woman, as she stood
Leblanct╕&q
Fabantou
Jondretteⁿ  └
Process returned 0 (0x0) execution time : 0.005 s
Press any key to continue.
The question is, why am I getting characters like "╕&q" and "ⁿ  └" in the newly created array? And also, is there any other more efficient way to achieve what I am trying to do?

The problem is that strncpy does not store a null in the target array if the length specified is less than the source string (as is always the case here). So whatever garbage happpend to be in the test array will remain there.
You can fix this specific problem by zeroing the test array, either when you declare it:
char test[20][20] = { { 0 } };
or as you use it:
memset(test[j], 0, 20);
strncpy(test[j], found, strlen(namesArr[i]));
but in general, it is best to avoid strncpy for this reason.

The length limitation for strncpy should be based on the target size, not the source length: that's the point of using it over strcpy, which uses only the source length. In your code
strncpy(test[j], found, strlen(namesArr[i]));
the length parameter is from the source array, which defeats the purpose of using strncpy. In addition, the nul terminator will not be present if the function copies the full limit of bytes, so the code should be
strncpy(test[j], found, 19); // limit to target size, leaving room for terminator
test[j][19] = '\0'; // add terminator (if copy did not complete)
Whether you loaded namesArr[] from file correctly is another potential issue, since you do not show the code.

Edited:
Slight modification to a previous answer:
1) Since you are working with C strings, make sure (since strncpy(...) does not do it for you) that you null terminate the buffer.
2) When using strncpy the length argument should represent the target string byte capacity - 1 (space for null terminator), not the source string length.
...
int len = strlen(found)
memset(test[j], 0, 20);
strncpy(test[j], found, 19);//maximum length (19) matches array size
//of target string -1 ( test[j] ).
if(len > 19) len = 19; //in case length of found is longer than the target string.
test[j][len+1] = 0;
...

In addition to what Chris Dodd said,, quoted from man strncpy
The strncpy() function is similar [to the strcpy() function], except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
Since the size parameter in your strncpy call is the length of the string, this will not include the null byte at the end of the string and thus your destination string will not be null-terminated from this call.

Related

How to check if there is a `\0` character in a filename using C?

I'd like to write a function like this:
int validate_file_name(char *filename)
{
//...
}
which will:
return 1 if there was no \0 character in the filename,
0 otherwise.
I thought it may be achieved using a simple for(size_t i = 0; i < strlen(filename); i++), but I don't know how to determine how much characters I've got to check?
I can't use strlen() because it will terminate on the first occurrence of a \0 character.
How should I approach this problem?
Clarification:
I am trying to apply these guidelines to a filename I receive. If you should avoid putting a \0 in a filename, how could you validate this if you've got no size parameter.
Moreover, there are strings with multiple \0 characters, like here: http://www.gnu.org/software/libc/manual/html_mono/libc.html#Argz-and-Envz-Vectors. Still, I had no idea that it is impossible to determine their length if it is not explicitly provided.
Conclusion:
There is no way you can determine the length of string which is not NULL-terminated. Unless you know the length of course or you deploy some dirty hacks: Checking if a pointer is allocated memory or not.
You are trying to solve a problem that does not need to be solved.
A file name is a string. In C, a "string" is by definition "a contiguous sequence of characters terminated by and including the first null
character".
It is impossible to have a string or a file name with a null character embedded in it.
It's possible to have a sequence of characters with an embedded null character. For example:
char buf[] = "foo\0bar.txt";
buf is an array of 12 characters; the characters at positions 3 and 11 are both null characters. If you treat buf as a string, for example by calling
fopen(buf, "r")
it will be treated as a string with a length of 3 (the length of a string does not include the terminating null character).
If you're working with character arrays that may or may not contain strings, then it makes sense to do what you're asking. You would need to keep track of the size of the buffer separately from the address of the initial character, either by passing an additional argument or by wrapping the pointer and the length in a structure.
But if you're dealing with file names, it's almost certainly best just to deal with strings and assume that whatever char* value is passed to your function points to a valid string. If it doesn't (if there is no null character anywhere in the array), that's the caller's fault, and not something you can reasonably check.
(Incidentally, Unix/Linux file systems explicitly forbid null characters in file names. The / character is also forbidden, because it's used as a directory name delimiter. Windows file systems have even stricter rules.)
One last point: NULL is (a macro that expands to) a null pointer constant. Please don't use the term NULL to refer to the null character '\0'.
The answer is that you can't write a function that does that if you don't know the length of the string.
To determine the length of the string strlen() searches for the '\0' character which if is not present will cause undefined behavior.
If you knew the length of the string then,
for (int i = 0 ; i < length ; ++i)
{
if (string[i] != '\0')
continue;
return 1;
}
return 0;
would work, if you don't know the length of the string then the condition would be
for (int i = 0 ; string[i] != '\0' ; ++i)
which obviously means that then searching for the '\0' makes no sense because it's presence is what makes all other string related functions to work properly.
If the string is not NULL-terminated, what else it is terminated by? And if you don't know that, what is it length? If you know the answer to these problems, you know the answer to your question.

strncpy doesn't always null-terminate

I am using the code below:
char filename[ 255 ];
strncpy( filename, getenv( "HOME" ), 235 );
strncat( filename, "/.config/stationlist.xml", 255 );
Get this message:
(warning) Dangerous usage of strncat - 3rd parameter is the maximum number of characters to append.
(error) Dangerous usage of 'filename' (strncpy doesn't always null-terminate it).
I typically avoid using str*cpy() and str*cat(). You have to contend with boundary conditions, arcane API definitions, and unintended performance consequences.
You can use snprintf() instead. You only have to be contend with the size of the destination buffer. And, it is safer in that it will not overflow, and will always NUL terminate for you.
char filename[255];
const char *home = getenv("HOME");
if (home == 0) home = ".";
int r = snprintf(filename, sizeof(filename), "%s%s", home, "/.config/stationlist.xml");
if (r >= sizeof(filename)) {
/* need a bigger filename buffer... */
} else if (r < 0) {
/* handle error... */
}
You may overflow filename with your strncat call.
Use:
strncat(filename, "/.config/stationlist.xml",
sizeof filename - strlen(filename) - 1);
Also be sure to null terminate your buffer after strncpy call:
strncpy( filename, getenv( "HOME" ), 235 );
filename[235] = '\0';
as strncpy does not null terminate its destination buffer if the length of the source is larger or equal than the maximum number of character to copy.
man strncpy has this to say:
Warning: If there is no null byte among the first n bytes
of src, the string placed in dest will not be null terminated.
If it encounters the 0 byte in the source before it exhausts the maximum length, it will be copied. But if the maximum length is reached before the first 0 in the source, the destination will not be terminated. Best to make sure it is yourself after strncpy() returns...
Both strncpy() and (even more so) strncat() have non-obvious behaviours and you would be best off not using either.
strncpy()
If your target string is, for sake of argument, 255 bytes long, strncpy() will always write to all 255 bytes. If the source string is shorter than 255 bytes, it will zero pad the remainder. If the source string is longer than 255 bytes, it will stop copying after 255 bytes, leaving the target without a null terminator.
strncat()
The size argument for most of the 'sized' functions (strncpy(), memcpy(), memmove(), etc) is the number of bytes in the target string (memory). With strncat(), the size is the amount of space left after the end of the string that's already in the target. Therefore, you can only safely use strncat() when you know both how big the target buffer is (S) and how long the target string currently is (L). The safe parameter to strncat() is then S-L (we'll worry about whether there's an off-by-one some other time). But given that you know L, there is no point in making strncat() skip the L characters; you could have passed target+L as the place to start, and simply copied the data. And you could use memmove() or memcpy(), or you could use strcpy(), or even strncpy(). If you don't know the length of the source string, you've got to be confident that it makes sense to truncate it.
Analysis of code in question
char filename[255];
strncpy(filename, getenv("HOME"), 235);
strncat(filename, "/.config/stationlist.xml", 255);
The first line is unexceptionable unless the size is deemed too small (or you run the program in a context where $HOME is not set), but that's out of scope for this question. The call to strncpy() does not use sizeof(filename) for the size, but rather an arbitrarily small number. It isn't the end of the world, but there's no guarantee that the last 20 bytes of the variable are zero bytes (or even that any of them is a zero byte), in general. Under some circumstances (filename is a global variable, previously unused) the zeros might be guaranteed.
The strncat() call tries to append 24 characters to the end of the string in filename that might already be 232-234 bytes long, or that might be arbitrarily longer than 235 bytes. Either way, that is a guaranteed buffer overflow. The usage of strncat() also falls directly into the trap about its size. You've said that it is OK to add up to 255 characters beyond the end of what's already in filename, which is blatantly wrong (unless the string from getenv("HOME") happens to be empty).
Safer code:
char filename[255];
static const char config_file[] = "/.config/stationlist.xml";
const char *home = getenv("HOME");
size_t len = strlen(home);
if (len > sizeof(filename) - sizeof(config_file))
...error file name will be too long...
else
{
memmove(filename, home, len);
memmove(filename+len, config_file, sizeof(config_file));
}
There will be those who insist that 'memcpy() is safe because the strings cannot overlap', and at one level they're correct, but overlap should be a non-issue and with memmove(), it is a non-issue. So, I use memmove() all the time...but I've not done the timing measurements to see how big of a problem it is, if it is a problem at all. Maybe the other people have done the measurements.
Summary
Don't use strncat().
Use strncpy() cautiously (noting its behaviour on very big buffers!).
Plan to use memmove() or memcpy() instead; if you can do the copy safely, you know the sizes necessary to make this sensible.
1) Your strncpy does not necessarily null-terminate filename. In fact, if getenv("HOME") is longer than 235 characters and getenv("HOME")[234] is not a 0, it won't.
2) Your strncat() may attempt to extend filename beyond 255 characters, because, as it says,
3rd parameter is the maximum number of characters to append.
(not the total allowed length of dst)
strncpy(Copied_to,Copied_from,sizeof_input) outputs garbage values after the character array (not used for string type). To solve it output using a for loop traversing the character array rather than simply using cout<<var;
for(i=0;i<size;i++){cout<<var[i]}
I couldn't find a work around for traversal on a windows system using minGW compiler.
Null termination does not solve the problem. Online compilers works just fine.

How do I properly store characters in an array using read?

I have written the following code, and I don't understand why read is not storing the characters the way I expect:
char temp;
char buf[256];
while(something)
read (in,&temp, 1);
buf[strlen(buf)] = temp;
}
If I print temp and the last place of the buf array as I am reading, sometimes they don't match up. For example maybe the character is 'd' but the array contains % or the character is 0 and the array contains .
I am reading less than 256 characters but it doesn't matter because I am printing as I am reading.
Am I missing something obvious?
Yes, you're not initializing buf -- strlen(buf) is undefined. You should initialize it like so:
buf[0] = 0;
Also, it's better to keep track of the length instead of calling strlen each iteration to avoid a Shlemiel the painter algorithm.
You should also be checking for errors in the call to read(2) -- if it returns -1 or 0, you should break out of your loop, since it means either an error occurred or you reached the end of the file/input stream.
Don't use strlen in this code. strlen relies on it's argument being a NULL terminated C string. So unless you initialize your entire buffer to 0, then this code doesn't work.
At any rate strlen isn't a good choice to use when buffering data, even if you know that you're working with printable string data, if only because strlen will traverse the string every time just to get your length.
Keep a separate counter, named e.g. numRead, only append to buf at the numRead position, and increment numRead by the amount that you read.

C programming: Replace an inner string using strcpy?

I've copied an HTML file into an array using the following code:
fseek(board, 0, SEEK_END);
long int size = ftell(board);
rewind(board);
char *sourcecode = calloc(size+1, sizeof(char));
fread(sourcecode, 1, size, board);
Now my goal is to replace a certain comment in the array with the already defined char string 'king'. E.g.
< html code>< !comment>< more html code>
to
< html code>king< more html code>
Im using the following code:
find_pointer = strstr(sourcecode, text2find);
strcpy(find_pointer, king);
printf("%s", sourcecode);
where text2find = "< !comment>";
however when I print, it is evident that all my characters past 'king' have been erased.. as if it automatically added a terminating character. How can i fix this so < more html code> remains in place?
EDIT:::::
I used strncpy and set a number of characters such that the terminating character was not added. is this the best method?
You basically can't do that, unless the stuff you want to replace is exactly the same size. In which case you can use either memcpy or strncpy.
If the sizes are different, you could try something along the lines of:
char *buffer = malloc(size); // size should be big enough to store the whole final html code
find_pointer = strstr(sourcecode, text2find);
len = find_pointer - sourcecode;
memcpy (buffer, sourcecode, len);
memcpy (buffer + len, "king", 4);
memcpy (buffer + len + 4, find_pointer + 4, strlen(sourcecode) - len - strlen(text2find));
free(sourcecode);
sourcecode = buffer;
Well, strcpy adds a 0-terminator. So although the remainder of the string remains in place, the standard string handling functions don't see it anymore because they stop at the 0-terminator. You can either manually overwrite it with a space or use memcpy instead of strcpy.
Replacing characters in a C string is painful, because you perform manipulations at a very low level, compared to, say, C++. You literally need to work out an algorithm for it!
First, observe that in-place replacement is not always possible: if the substring that you are replacing is shorter than the replacement, you would need to allocate more memory. It is easier to allocate the memory for the result either way, so you may proceed as follows:
Find the length of the string after the replacement. For that, you'd need to find the beginning and the end of the comment you're replacing, and do the math.
Next, you allocate a new chunk of memory for the result, and memcpy the source up to the replacement point into it.
Now you copy the replacement string, and finally the ending portion of the source into the result
Finally, you free the buffer of the source string, and return the result.
First, you should use strncpy (or, better yet, strlcpy if available) because it actually performs bounds checking (i.e., it copies only a specified, supplied number of characters). Otherwise you may end up attempting to copy memory past the end of the destination string, resulting in undefined and potentially destructive behavior. Second, even if you were to use a function like strncpy or memcpy to avoid copying the terminating null character, your destination string would not be properly formatted because the string that you are trying to overwrite has a different length than the string that you're attempting to copy.

second memcpy() attaches previous memcpy() array to it

I have a little problem here with memcpy()
When I write this
char ipA[15], ipB[15];
size_t b = 15;
memcpy(ipA,line+15,b);
It copies b bytes from array line starting at 15th element (fine, this is what i want)
memcpy(ipB,line+31,b);
This copies b bytes from line starting at 31st element, but it also attaches to it the result for previous command i.e ipA.
Why? ipB size is 15, so it shouldnt have enough space to copy anything else. whats happening here?
result for ipA is 192.168.123.123
result for ipB becomes 205.123.123.122 192.168.123.123
Where am I wrong? I dont actually know alot about memory allocation in C.
It looks like you're not null-terminating the string in ipA. The compiler has put the two variables next to one another in memory, so string operations assume that the first null terminator is sometime after the second array (whenever the next 0 occurs in memory).
Try:
char ipA[16], ipB[16];
size_t b = 15;
memcpy(ipA,line+15,b);
ipA[15] = '\0';
memcpy(ipB,line+31,b);
ipB[15] = '\0';
printf("ipA: %s\nipB: %s\n", ipA, ipB)
This should confirm whether this is the problem. Obviously you could make the code a bit more elegant than my test code above. As an alternative to manually terminating, you could use printf("%.*s\n", b, ipA); or similar to force printf to print the correct number of characters.
Are you checking the content of the arrays by doing printf("%s", ipA) ? If so, you'll end up with the described effect since your array is interpreted as a C string which is not null terminated. Do this instead: printf("%.*s", sizeof(ipA), ipA)
Character strings in C require a terminating mark. It is the char value 0.
As your two character strings are contiguous in memory, if you don't terminate the first character string, then when reading it, you will continue until memory contains the end-of-string character.

Resources