Different ways to calculate string length - c

A comment on one of my answers has left me a little puzzled. When trying to compute how much memory is needed to concat two strings to a new block of memory, it was said that using snprintf was preferred over strlen, as shown below:
size_t length = snprintf(0, 0, "%s%s", str1, str2);
// preferred over:
size_t length = strlen(str1) + strlen(str2);
Can I get some reasoning behind this? What is the advantage, if any, and would one ever see one result differ from the other?

I was the one who said it, and I left out the +1 in my comment which was written quickly and carelessly, so let me explain. My point was merely that you should use the pattern of using the same method to compute the length that will eventually be used to fill the string, rather than using two different methods that could potentially differ in subtle ways.
For example, if you had three strings rather than two, and two or more of them overlapped, it would be possible that strlen(str1)+strlen(str2)+strlen(str3)+1 exceeds SIZE_MAX and wraps past zero, resulting in under-allocation and truncation of the output (if snprintf is used) or extremely dangerous memory corruption (if strcpy and strcat are used).
snprintf will return -1 with errno=EOVERFLOW when the resulting string would be longer than INT_MAX, so you're protected. You do need to check the return value before using it though, and add one for the null terminator.

If you only need to determine how big would be the concatenation of the two strings, I don't see any particular reason to prefer snprintf, since the minimum operations to determine the total length of the two strings is what the two strlen calls do. snprintf will almost surely be slower, because it has to check the parameters and parse the format string besides just walking the two strings counting the characters.
... but... it may be an intelligent move to use snprintf if you are in a scenario where you want to concatenate two strings, and have a static, not too big buffer to handle normal cases, but you can fallback to a dynamically allocated buffer in case of big strings, e.g.:
/* static buffer "big enough" for most cases */
char buffer[256];
/* pointer used in the part where work on the string is actually done */
char * outputStr=buffer;
/* try to concatenate, get the length of the resulting string */
int length = snprintf(buffer, sizeof(buffer), "%s%s", str1, str2);
if(length<0)
{
/* error, panic and death */
}
else if(length>sizeof(buffer)-1)
{
/* buffer wasn't enough, allocate dynamically */
outputStr=malloc(length+1);
if(outputStr==NULL)
{
/* allocation error, death and panic */
}
if(snprintf(outputStr, length, "%s%s", str1, str2)<0)
{
/* error, the world is doomed */
}
}
/* here do whatever you want with outputStr */
if(outputStr!=buffer)
free(outputStr);

One advantage would be that the input strings are only scanned once (inside the snprintf()) instead of twice for the strlen/strcpy solution.
Actually, on rereading this question and the comment on your previous answer, I don't see what the point is in using sprintf() just to calculate the concatenated string length. If you're actually doing the concatenation, my above paragraph applies.

You need to add 1 to the strlen() example. Remember you need to allocate space for nul terminating byte.

So snprintf( ) gives me the size a string would have been. That means I can malloc( ) space for that guy. Hugely useful.
I wanted (but did not find until now) this function of snprintf( ) because I format tons of strings for output later; but I wanted not to have to assign static bufs for the outputs because it's hard to predict how long the outputs will be. So I ended up with a lot of 4096-long char arrays :-(
But now -- using this newly-discovered (to me) snprintf( ) char-counting function, I can malloc( ) output bufs AND sleep at night, both.
Thanks again and apologies to the OP and to Matteo.

EDIT: random, mistaken nonsense removed. Did I say that?
EDIT: Matteo in his comment below is absolutely right and I was absolutely wrong.
From C99:
2 The snprintf function is equivalent to fprintf, except that the output is written into
an array (specified by argument s) rather than to a stream. If n is zero, nothing is written,
and s may be a null pointer. Otherwise, output characters beyond the n-1st are
discarded rather than being written to the array, and a null character is written at the end
of the characters actually written into the array. If copying takes place between objects
that overlap, the behavior is undefined.
Returns
3 The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
Thank you, Matteo, and I apologize to the OP.
This is great news because it gives a positive answer to a question I'd asked here only a three weeks ago. I can't explain why I didn't read all of the answers, which gave me what I wanted. Awesome!

The "advantage" that I can see here is that strlen(NULL) might cause a segmentation fault, while (at least glibc's) snprintf() handles NULL parameters without failing.
Hence, with glibc-snprintf() you don't need to check whether one of the strings is NULL, although length might be slightly larger than needed, because (at least on my system) printf("%s", NULL); prints "(null)" instead of nothing.
I wouldn't recommend using snprintf() instead of strlen() though. It's just not obvious. A much better solution is a wrapper for strlen() which returns 0 when the argument is NULL:
size_t my_strlen(const char *str)
{
return str ? strlen(str) : 0;
}

Related

C Why is an unrelated/undeclared variable influencing the output of another?

When the character array substring[#] is set as [64], the file outputs an additional character. The additional character varies with each compile. Sometimes es?, sometimes esx among others.
If I change the [64] to any other number (at least the ones I've tried: 65, 256,1..) it outputs correctly as es.
Even more strange, if I leave the unused/undeclared character array char newString[64] in this file, it outputs the correct substring es even with the 64.
How does the seemingly arbitrary size of 64 affect the out?
How does a completely unrelated character array (newString) influence how another character array is output?
.
int main () {
char string[64];
char newString[64];
char substring[64];
fgets(string,64,stdin);
strncpy(substring, string+1, 1);
printf("%s\n", substring);
return 0;
}
The problem is, strncpy() will not copy the null terminator because you've asked it not to.
Using strncpy() is safe and dangerous at the same time, because it will not always copy the null terminator, also using it for a single byte is pointless, instead do this
substring[0] = string[1];
substring[1] = '\0';
and it shall work.
You should read the manual page strncpy(3) to understand what I mean correctly, if you read the manual carefully every time you would become a better programmer in a shorter time.

strncpy doesn't always null-terminate

I am using the code below:
char filename[ 255 ];
strncpy( filename, getenv( "HOME" ), 235 );
strncat( filename, "/.config/stationlist.xml", 255 );
Get this message:
(warning) Dangerous usage of strncat - 3rd parameter is the maximum number of characters to append.
(error) Dangerous usage of 'filename' (strncpy doesn't always null-terminate it).
I typically avoid using str*cpy() and str*cat(). You have to contend with boundary conditions, arcane API definitions, and unintended performance consequences.
You can use snprintf() instead. You only have to be contend with the size of the destination buffer. And, it is safer in that it will not overflow, and will always NUL terminate for you.
char filename[255];
const char *home = getenv("HOME");
if (home == 0) home = ".";
int r = snprintf(filename, sizeof(filename), "%s%s", home, "/.config/stationlist.xml");
if (r >= sizeof(filename)) {
/* need a bigger filename buffer... */
} else if (r < 0) {
/* handle error... */
}
You may overflow filename with your strncat call.
Use:
strncat(filename, "/.config/stationlist.xml",
sizeof filename - strlen(filename) - 1);
Also be sure to null terminate your buffer after strncpy call:
strncpy( filename, getenv( "HOME" ), 235 );
filename[235] = '\0';
as strncpy does not null terminate its destination buffer if the length of the source is larger or equal than the maximum number of character to copy.
man strncpy has this to say:
Warning: If there is no null byte among the first n bytes
of src, the string placed in dest will not be null terminated.
If it encounters the 0 byte in the source before it exhausts the maximum length, it will be copied. But if the maximum length is reached before the first 0 in the source, the destination will not be terminated. Best to make sure it is yourself after strncpy() returns...
Both strncpy() and (even more so) strncat() have non-obvious behaviours and you would be best off not using either.
strncpy()
If your target string is, for sake of argument, 255 bytes long, strncpy() will always write to all 255 bytes. If the source string is shorter than 255 bytes, it will zero pad the remainder. If the source string is longer than 255 bytes, it will stop copying after 255 bytes, leaving the target without a null terminator.
strncat()
The size argument for most of the 'sized' functions (strncpy(), memcpy(), memmove(), etc) is the number of bytes in the target string (memory). With strncat(), the size is the amount of space left after the end of the string that's already in the target. Therefore, you can only safely use strncat() when you know both how big the target buffer is (S) and how long the target string currently is (L). The safe parameter to strncat() is then S-L (we'll worry about whether there's an off-by-one some other time). But given that you know L, there is no point in making strncat() skip the L characters; you could have passed target+L as the place to start, and simply copied the data. And you could use memmove() or memcpy(), or you could use strcpy(), or even strncpy(). If you don't know the length of the source string, you've got to be confident that it makes sense to truncate it.
Analysis of code in question
char filename[255];
strncpy(filename, getenv("HOME"), 235);
strncat(filename, "/.config/stationlist.xml", 255);
The first line is unexceptionable unless the size is deemed too small (or you run the program in a context where $HOME is not set), but that's out of scope for this question. The call to strncpy() does not use sizeof(filename) for the size, but rather an arbitrarily small number. It isn't the end of the world, but there's no guarantee that the last 20 bytes of the variable are zero bytes (or even that any of them is a zero byte), in general. Under some circumstances (filename is a global variable, previously unused) the zeros might be guaranteed.
The strncat() call tries to append 24 characters to the end of the string in filename that might already be 232-234 bytes long, or that might be arbitrarily longer than 235 bytes. Either way, that is a guaranteed buffer overflow. The usage of strncat() also falls directly into the trap about its size. You've said that it is OK to add up to 255 characters beyond the end of what's already in filename, which is blatantly wrong (unless the string from getenv("HOME") happens to be empty).
Safer code:
char filename[255];
static const char config_file[] = "/.config/stationlist.xml";
const char *home = getenv("HOME");
size_t len = strlen(home);
if (len > sizeof(filename) - sizeof(config_file))
...error file name will be too long...
else
{
memmove(filename, home, len);
memmove(filename+len, config_file, sizeof(config_file));
}
There will be those who insist that 'memcpy() is safe because the strings cannot overlap', and at one level they're correct, but overlap should be a non-issue and with memmove(), it is a non-issue. So, I use memmove() all the time...but I've not done the timing measurements to see how big of a problem it is, if it is a problem at all. Maybe the other people have done the measurements.
Summary
Don't use strncat().
Use strncpy() cautiously (noting its behaviour on very big buffers!).
Plan to use memmove() or memcpy() instead; if you can do the copy safely, you know the sizes necessary to make this sensible.
1) Your strncpy does not necessarily null-terminate filename. In fact, if getenv("HOME") is longer than 235 characters and getenv("HOME")[234] is not a 0, it won't.
2) Your strncat() may attempt to extend filename beyond 255 characters, because, as it says,
3rd parameter is the maximum number of characters to append.
(not the total allowed length of dst)
strncpy(Copied_to,Copied_from,sizeof_input) outputs garbage values after the character array (not used for string type). To solve it output using a for loop traversing the character array rather than simply using cout<<var;
for(i=0;i<size;i++){cout<<var[i]}
I couldn't find a work around for traversal on a windows system using minGW compiler.
Null termination does not solve the problem. Online compilers works just fine.

Why would one add 1 or 2 to the second argument of snprintf?

What is the role of 1 and 2 in these snprintf functions? Could anyone please explain it
snprintf(argv[arg++], strlen(pbase) + 2 + strlen("ivlpp"), "%s%ccivlpp", pbase, sep);
snprintf(argv[arg++], strlen(defines_path) + 1, "-F\"%s\"", defines_path);
The role of the +2 is to allow for a terminal null and the embedded character from the %c format, so there is exactly the right amount of space for formatting the first string. but (as 6502 points out), the actual string provided is one space shorter than needed because the strlen("ivlpp") doesn't match the civlpp in the format itself. This means that the last character (the second 'p') will be truncated in the output.
The role of the +1 is also to cause snprintf() to truncate the formatted data. The format string contains 4 literal characters, and you need to allow for the terminal null, so the code should allocate strlen(defines)+5. As it is, the snprintf() truncates the data, leaving off 4 characters.
I'm dubious about whether the code really works reliably...the memory allocation is not shown, but will have to be quite complex - or it will have to over-allocate to ensure that there is no danger of buffer overflow.
Since a comment from the OP says:
I don't know the use of snprintf()
int snprintf(char *restrict s, size_t n, const char *restrict format, ...);
The snprintf() function formats data like printf(), but it writes it to a string (the s in the name) instead of to a file. The first n in the name indicates that the function is told exactly how long the string is, and snprintf() therefore ensures that the output data is null terminated (unless the length is 0). It reports how long the string should have been; if the reported value is longer than the value provided, you know the data got truncated.
So, overall, snprintf() is a relatively safe way of formatting strings, provided you use it correctly. The examples in the question do not demonstrate 'using it correctly'.
One gotcha: if you work on MS Windows, be aware that the MSVC implementation of snprintf() does not exactly follow the C99 standard (and it looks a bit as though MS no longer provides snprintf() at all; only various alternatives such as _snprintf()). I forget the exact deviation, but I think it means that the string is not properly null-terminated in all circumstances when it should be longer than the space provided.
With locally defined arrays, you normally use:
nbytes = snprintf(buffer, sizeof(buffer), "format...", ...);
With dynamically allocated memory, you normally use:
nbytes = snprintf(dynbuffer, dynbuffsize, "format...", ...);
In both cases, you check whether nbytes contains a non-negative value less than the size argument; if it does, your data is OK; if the value is equal to or larger, then your data got chopped (and you know how much space you needed to allocate).
The C99 standard says:
The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a negative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
The programmer whose code you are reading doesn't know how to use snprintf properly. The second argument is the buffer size, so it should almost always look like this:
snprintf(buf, sizeof buf, "..." ...);
The above is for situations where buf is an array, not a pointer. In the latter case you have to pass the buffer size along:
snprintf(buf, bufsize, "...", ...);
Computing the buffer size is unneeded.
By the way, since you tagged the question as qt-related. There is a very nice QString class that you should use instead.
At a first look both seem incorrect.
In the first case the correct computation would be path + sep + name + NUL so 2 would seem ok, but for the name the strlen call is using ilvpp while the formatting code is using instead cilvpp that is one char longer.
In the second case the number of chars added is 4 (-L"") so the number to add should be 5 because of the ending NUL.

Writing into a file

I have a very simple question regarding file write.
I have this program:
char buf[20];
size_t nbytes;
strcpy(buf, "All that glitters is not gold\n");
fd= open("test_file.txt",O_WRONLY);
write(fd,buf,strlen(buf));
close(fd);
What am confused is when I open the file test_file.txt after running this program I see some characters like ^C^#^#^#^^^# after the line "All that glitters is not": Notice that portion of the buf is not written and those characters appear instead. Why is that so?
You're writing more than 19 chars in that buffer. Once you've done that, the behavior of your program is undefined. It could do whatever it wants.
Allocate a large enough buffer. It has to be able to fit all the letters plus a terminating 0 if you need to be able to treat it as a C string.
The string "All that glitters is not gold\n" is longer than 20 characters. I suggest you try it with a larger buffer.
Actually, if you're going to do any nontrivial work in C I suggest you never ever use strcpy, as a general habit. Use functions like strncpy which let you specify a buffer size so that it's clear you'll never overflow.
libgcc strcpy Manual says:
If the destination string of a
strcpy() is not large enough (that
is, if the programmer was stupid
or lazy, and failed to check the size
before copying) then anything might
happen. Overflowing fixed length
strings is a favorite cracker
technique.
Also the strlen says
The strlen() function calculates
the length of the string s, not
including the terminating '\0'
character.
So i guess strlen () does not return what you expect it to return and as a result the extra characters are written
To make the thing work, you need to allocate a large enough buffer, which can hold the entire string.

Why should you use strncpy instead of strcpy?

Edit: I've added the source for the example.
I came across this example:
char source[MAX] = "123456789";
char source1[MAX] = "123456789";
char destination[MAX] = "abcdefg";
char destination1[MAX] = "abcdefg";
char *return_string;
int index = 5;
/* This is how strcpy works */
printf("destination is originally = '%s'\n", destination);
return_string = strcpy(destination, source);
printf("after strcpy, dest becomes '%s'\n\n", destination);
/* This is how strncpy works */
printf( "destination1 is originally = '%s'\n", destination1 );
return_string = strncpy( destination1, source1, index );
printf( "After strncpy, destination1 becomes '%s'\n", destination1 );
Which produced this output:
destination is originally = 'abcdefg'
After strcpy, destination becomes '123456789'
destination1 is originally = 'abcdefg'
After strncpy, destination1 becomes '12345fg'
Which makes me wonder why anyone would want this effect. It looks like it would be confusing. This program makes me think you could basically copy over someone's name (eg. Tom Brokaw) with Tom Bro763.
What are the advantages of using strncpy() over strcpy()?
The strncpy() function was designed with a very particular problem in mind: manipulating strings stored in the manner of original UNIX directory entries. These used a short fixed-sized array (14 bytes), and a nul-terminator was only used if the filename was shorter than the array.
That's what's behind the two oddities of strncpy():
It doesn't put a nul-terminator on the destination if it is completely filled; and
It always completely fills the destination, with nuls if necessary.
For a "safer strcpy()", you are better off using strncat() like so:
if (dest_size > 0)
{
dest[0] = '\0';
strncat(dest, source, dest_size - 1);
}
That will always nul-terminate the result, and won't copy more than necessary.
strncpy combats buffer overflow by requiring you to put a length in it. strcpy depends on a trailing \0, which may not always occur.
Secondly, why you chose to only copy 5 characters on 7 character string is beyond me, but it's producing expected behavior. It's only copying over the first n characters, where n is the third argument.
The n functions are all used as defensive coding against buffer overflows. Please use them in lieu of older functions, such as strcpy.
While I know the intent behind strncpy, it is not really a good function. Avoid both. Raymond Chen explains.
Personally, my conclusion is simply to avoid strncpy and all its friends if you are dealing with null-terminated strings. Despite the "str" in the name, these functions do not produce null-terminated strings. They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong. Not only do you fail to get proper null termination if the source is too long, but if the source is short you get unnecessary null padding.
See also Why is strncpy insecure?
strncpy is NOT safer than strcpy, it just trades one type of bugs with another. In C, when handling C strings, you need to know the size of your buffers, there is no way around it. strncpy was justified for the directory thing mentioned by others, but otherwise, you should never use it:
if you know the length of your string and buffer, why using strncpy ? It is a waste of computing power at best (adding useless 0)
if you don't know the lengths, then you risk silently truncating your strings, which is not much better than a buffer overflow
What you're looking for is the function strlcpy() which does terminate always the string with 0 and initializes the buffer. It also is able to detect overflows. Only problem, it's not (really) portable and is present only on some systems (BSD, Solaris). The problem with this function is that it opens another can of worms as can be seen by the discussions on
http://en.wikipedia.org/wiki/Strlcpy
My personal opinion is that it is vastly more useful than strncpy() and strcpy(). It has better performance and is a good companion to snprintf(). For platforms which do not have it, it is relatively easy to implement.
(for the developement phase of a application I substitute these two function (snprintf() and strlcpy()) with a trapping version which aborts brutally the program on buffer overflows or truncations. This allows to catch quickly the worst offenders. Especially if you work on a codebase from someone else.
EDIT: strlcpy() can be implemented easily:
size_t strlcpy(char *dst, const char *src, size_t dstsize)
{
size_t len = strlen(src);
if(dstsize) {
size_t bl = (len < dstsize-1 ? len : dstsize-1);
((char*)memcpy(dst, src, bl))[bl] = 0;
}
return len;
}
The strncpy() function is the safer one: you have to pass the maximum length the destination buffer can accept. Otherwise it could happen that the source string is not correctly 0 terminated, in which case the strcpy() function could write more characters to destination, corrupting anything which is in the memory after the destination buffer. This is the buffer-overrun problem used in many exploits
Also for POSIX API functions like read() which does not put the terminating 0 in the buffer, but returns the number of bytes read, you will either manually put the 0, or copy it using strncpy().
In your example code, index is actually not an index, but a count - it tells how many characters at most to copy from source to destination. If there is no null byte among the first n bytes of source, the string placed in destination will not be null terminated
strncpy fills the destination up with '\0' for the size of source, eventhough the size of the destination is smaller....
manpage:
If the length of src is less than n, strncpy() pads the remainder of
dest with null bytes.
and not only the remainder...also after this until n characters is
reached. And thus you get an overflow... (see the man page
implementation)
This may be used in many other scenarios, where you need to copy only a portion of your original string to the destination. Using strncpy() you can copy a limited portion of the original string as opposed by strcpy(). I see the code you have put up comes from publib.boulder.ibm.com.
That depends on our requirement.
For windows users
We use strncpy whenever we don't want to copy entire string or we want to copy only n number of characters. But strcpy copies the entire string including terminating null character.
These links will help you more to know about strcpy and strncpy
and where we can use.
about strcpy
about strncpy
the strncpy is a safer version of strcpy as a matter of fact you should never use strcpy because its potential buffer overflow vulnerability which makes you system vulnerable to all sort of attacks

Resources