Given:
char test[] = "bla-bla-bla";
Which of the two is more correct?
char *test1 = malloc(strlen(test));
strcpy(test1, test);
or
char *test1 = malloc(sizeof(test));
strcpy(test1, test);
This will work on all null-terminated strings, including pointers to char arrays:
char test[] = "bla-bla-bla";
char *test1 = malloc(strlen(test) + 1);
strcpy(test1, test);
You won't get the correct size of the array pointed to by char* or const char* with sizeof. This solution is therefore more versatile.
Neither:
#include <string.h>
char *mine = strdup(test);
You should use strlen, because sizeof will fail silently if you change test to be a run-time defined string. This means that strlen is a far safer idea than sizeof as it will keep working.
char test[]="bla-bla-bla";
char *test1 = malloc(strlen(test) + 1); // +1 for the extra NULL character
strcpy(test1, test);
I think sizeof is the correct one. Reason behind that is strlen(str) will give you length of the string( excluding the terminating null). And if you are using strcpy, it actually copy the whole string including the terminating null, so you will allocate one byte less if you use strlen in malloc. But sizeof gives the size of the string pointed by test, including the terminating null, so you will get correct size malloc chunk to copy the string including the terminating null.
1) definitely causes UB
2) may cause UB (if malloc fails)
I'd go with 2) as there is a better chance of the construct working as intended; or even better I'd write a version that works as intended (without UB) in all situations.
Edit
Undefined Behaviour in 1)
test1 will have space for the characters in test, but not for the terminating '\0'. The call to strcpy() will try to write a '\0' to memory that does not belong to test1, hence UB.
Undefined Behaviour in 2)
If the call to malloc() fails to reserve the requested memory, test1 will be assigned NULL. Passing NULL to strcpy() invokes UB.
The return value of calls to malloc() (and calloc() and friends) should always be tested to ensure the operation worked as expected.
(1) with strlen but not adding 1 is definitely incorrect. If you add 1, it would have the added benefit that it also works for pointers, not just arrays.
On the other hand, (2) is preferred as long as your string is actually an array, as it results in a compile-time constant, rather than a call to strlen (and thus faster and smaller code). Actually a modern compiler like gcc can probably optimize the strlen out if it knows the string is constant, but it may be hard for the compiler to determine this, so I'd always use sizeof when possible.
If it is a critical path, sizeof has advantage over strlen as it has an O(1) complexity which can save CPU cycles.
Related
This problem is blowing my mind...Can anyone please sort out the problem because i have already wasted hours on this.. ;(
#include <stdio.h>
#include <string.h>
int main(){
char string[] = "Iam pretty much big string.";
char temp1[50];
char temp2[10];
// strcpy() and strncpy()
strcpy(temp1, string);
printf("%s\n", temp1);
strncpy(temp2, temp1, 10);
printf("%s\n", temp2);
return 0;
}
Result
Iam pretty much big string.
Iam prettyIam pretty much big string.
Expected Result:
Iam pretty much big string.
Iam pretty
The strncpy function is respecting the 10 byte limit you're giving it.
It copies the first 10 bytes from string to temp2. None of those 10 bytes is a null byte, and the size of temp2 is 10, so there are no null bytes in temp2. When you then pass temp2 to printf, it reads past the end of the array invoking undefined behavior.
You would need to set the size given to strncpy to the array size - 1, then manually add the null byte to the end.
strncpy(temp2, temp1, sizeof(temp2)-1);
temp2[sizeof(temp2)-1] = 0;
The address of temp2 is just before the address of temp1 and because you do not copy the final 0, the printf will continue printing after the end of temp2.
As time as you do not insert the 0, the result of printf is undefined.
You invoke Undefined Behavior attempting to print temp2 as temp2 is not nul-terminated. From man strncpy:
"Warning: If there is no null byte among the first n bytes of src,
the string placed in dest will not be null-terminated." (emphasis in
original)
See also C11 Standard - 7.24.2.4 The strncpy function (specifically footnote: 308)
So temp2 is not nul-terminated.
Citation of the appropriate [strncpy] tag on Stack Overflow https://stackoverflow.com/tags/strncpy/info, which may help you to understand what happens exactly:
This function is not recommended to use for any purpose, neither in C nor C++. It was never intended to be a "safe version of strcpy" but is often misused for such purposes. It is in fact considered to be much more dangerous than strcpy, since the null termination mechanism of strncpy is not intuitive and therefore often misunderstood. This is because of the following behavior specified by ISO 9899:2011 7.24.2.4:
char *strncpy(char * restrict s1,
const char * restrict s2,
size_t n);
/--/
3 If the array pointed to by s2 is a string that is shorter than n characters, null characters
are appended to the copy in the array pointed to by s1, until n characters in all have been
written.
A very common mistake is to pass an s2 which is exactly as many characters as the n parameter, in which case s1 will not get null terminated. That is: strncpy(dst, src, strlen(src));
/* MCVE of incorrect use of strncpy */
#include <string.h>
#include <stdio.h>
int main (void)
{
const char* STR = "hello";
char buf[] = "halt and catch fire";
strncpy(buf, STR, strlen(STR));
puts(buf); // prints "helloand catch fire"
return 0;
}
Recommended practice in C is to check the buffer size in advance and then use strcpy(), alternatively memcpy().
Recommended practice in C++ is to use std::string instead.
From the manpage for strncpy():
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
Either your input is shorter than the supplied length, you add the terminating null byte yourself, or it won't be there. printf() expects the string to be properly null terminated, and thus overruns your allocated buffer.
This only goes to show that the n variants of many standard functions are by no means safe. You must read their respective man pages, and specifically look for what they do when the supplied length does not suffice.
char* xpx(char* src)
{
char result[sizeof(src)];
strcpy(result,src);
return result;
}
There are 2 bugs in the above code.
1) strcpy is passing the src as parameters but it is not legal as str
is a pointer.
I could not able to find the another one. Could you help me?
Look at strcpy docs:
char *strcpy( char *restrict dest, const char *restrict src );
Copies the null-terminated byte string pointed to by src, including
the null terminator, to the character array whose first element is
pointed to by dest. The behavior is undefined if the dest array is
not large enough. [...] (important but not relevant to the question so
omitted part)
So strcpy does take 2 pointers. That's fine. It's not a bug.
To find the bug pay attention to (and think about it, it's logical): dest array must be large enough. What does "large enough" mean here? Well since the functions copies the string from src, including null terminator to dest it means dest must be at least the length of the string in src + 1 for the null terminator. That means strlen(src) + 1.
sizeof(src) is the same as sizeof(int*) which is the size of a char pointer on the platform. The size of the pointer. Not what you want.
The next error is that the functions returns the address of an automatic storage object, aka the result array. This means that the array will cease to exist when the function exits and thus the function returns a pointer to an object that is no longer valid. A solution to this would be to use malloc to allocate the array. Another is to change the signature of xpx to something similar to strcpy where the destination array is supplied.
So summing them you need something along this:
char* result = malloc(strlen(src) + 1);
Another bug (yes, technically it's not a bug, but semantically it's a bug in my opinion) is that src should be of const char* type.
Another source of potential problems and bugs is if the input is not well-behaved, e.g. if src is not null-terminated, but it is debatable how much responsibility this function should carry regarding this.
not legal as str is a pointer
Not sure what you mean by that, but passing pointers to functions is perfectly fine.
As for "bugs" in the function, it is hard to say if there are any, given that you have not provided what is the expected behavior of the function. The function is legal C, although quite useless and dangerous to use (and will trigger warnings from common compilers):
Do not return addresses to local variables, since they do not exist anymore after your return, so you shall not access them.
If you are passing a null-terminated string into a function and you need to get a buffer of its size; then you have to find out its length at runtime using something like strlen() and then allocate the memory on the heap with malloc() (or using VLAs on C99 and later).
I encountered the following example of using memset in tutorialspoint:
#include <stdio.h>
#include <string.h>
int main(){
char src[40];
char dest[100];
memset(dest, '\0', sizeof(dest));
strcpy(src, "This is tutorialspoint.com");
strcpy(dest, src);
printf("Final copied string : %s\n", dest);
return(0);
}
I don't get why the memset line is used, as the compile and result are the same when that line is commented. I would like to ask is that line necessary? or is it a good practice to do so when doing strcpy()? or it is just one random line.
Thanks!
It's not needed in this case, in the sense that it has no effect on the output. It might be needed in some similar cases.
char dest[100];
This defines dest as a local array of 100 chars. Its initial value is garbage. It could have been written as:
char dest[100] = "";
or
char dest[100] = { 0 };
but none of those are necessary because dest is assigned a value before it's used.
strcpy(src, "This is tutorialspoint.com");
strcpy(dest, src);
This copies the string contained in src into the array dest. It copies the 26 characters of "This is tutorialspoint.com" plus 1 additional character, the terminating '\0; that marks the end of the string. The previous contents of the dest array are ignored. (If we were using strcat(), it would matter, because strcat() has to find a '\0' in the destination before it can start copying.)
Without the memset() call, the remaining 73 bytes of dest would be garbage -- but that wouldn't matter, because we never look at anything past the '\0' at dest[26].
If, for some reason, we decided to add something like:
printf("dest[99] = '%c'\n", dest[99]);
to the program, then the memset() would matter. But since the purpose of dest is to hold a string (which is by definition terminated by a '\0' null character), that wouldn't be a sensible thing to do. Perfectly legal, but not sensible.
the posted code could skip the initialization via memset().
A time it really becomes useful is when debugging and you use the debugger to display the contents of the variable.
Another time to use memset() is when allocating something like an array of pointers, which might not all be set to point to something specific, like more allocated memory.
Then when passing those pointers to 'free()the unused pointers are set to NULL, so will not cause a crash when passed tofree()`
In C I have a path in one of my strings
/home/frankv/
I now want to add the name of files that are contained in this folder - e.g. file1.txt file123.txt etc.
Having declared my variable either like this
char pathToFile[strlen("/home/frankv/")+1]
or
char *pathToFile = malloc(strlen("/home/frankv/")+1)
My problem is that I cannot simply add more characters because it would cause a buffer overflow. Also, what do I do in case I do not know how long the filenames will be?
I've really gotten used to PHP lazy $string1.$string2 .. What is the easiest way to do this in C?
If you've allocated a buffer with malloc(), you can use realloc() to expand it:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *buf;
const char s1[] = "hello";
const char s2[] = ", world";
buf = malloc(sizeof s1);
strcpy(buf, s1);
buf = realloc(buf, sizeof s1 + sizeof s2 - 1);
strcat(buf, s2);
puts(buf);
return 0;
}
NOTE: I have omitted error checking. You shouldn't. Always check whether malloc() returns a null pointer; if it does, take some corrective action, even if it's just terminating the program. Likewise for realloc(). And if you want to be able to recover from a realloc() failure, store the result in a temporary so you don't clobber your original pointer.
Use std::string, if possible. Else, reallocate another block of memory and use strcpy and strcat.
You have a couple options, but, if you want to do this dynamically using no additional libraries, realloc() is the stdlib function you're looking for:
char *pathToFile = malloc(strlen("/home/frankv/")+1);
char *string_to_add = "filename.txt";
char *p = realloc(pathToFile, strlen(pathToFile) + strlen(string_to_add) + 1);
if (!p) abort();
pathToFile = p;
strcat(p, string_to_add);
Note: you should always assign the result of realloc to a new pointer first, as realloc() returns NULL on failure. If you assign to the original pointer, you are begging for a memory leak.
If you're going to be doing much string manipulation, though, you may want to consider using a string library. Two I've found useful are bstring and ustr.
In case you can use C++, use the std::string. In case you must to use pure C, use what's call doubling - i.e. when out of space in the string - double the memory and copy the string into the new memory. And you'll have to use the second syntax:
char *pathToFile = malloc(strlen("/home/frankv/")+1);
You have chosen the wrong language for manipulating strings!
The easy and conventional way out is to do something like:
#define MAX_PATH 260
char pathToFile[MAX_PATH+1] = "/home/frankv/";
strcat(pathToFile, "wibble/");
Of course, this is error prone - if the resulting string exceeds MAX_PATH characters, anything can happen, and it is this sort of programming which is the route many trojans and worms use to penetrate security (by corrupting memory in a carefully defined way). Hence my deliberate choice of 260 for MAX_PATH, which is what it used to be in Windows - you can still make Windows Explorer do strange things to your files with paths over 260 characters, possibly because of code like this!
strncat may be a small help - you can at least tell it the maximum size of the destination, and it won't copy beyond that.
To do it robustly you need a string library which does variable length strings correctly. But I don't know if there is such a thing for C (C++ is a different matter, of course).
I will be coaching an ACM Team next month (go figure), and the time has come to talk about strings in C. Besides a discussion on the standard lib, strcpy, strcmp, etc., I would like to give them some hints (something like str[0] is equivalent to *str, and things like that).
Do you know of any lists (like cheat sheets) or your own experience in the matter?
I'm already aware of the books for the ACM competition (which are good, see particularly this), but I'm after tricks of the trade.
Thank you.
Edit: Thank you very much everybody. I will accept the most voted answer, and have duly upvoted others which I think are relevant. I expect to do a summary here (like I did here, asap). I have enough material now and I'm certain this has improved the session on strings immensely. Once again, thanks.
It's obvious but I think it's important to know that strings are nothing more than an array of bytes, delimited by a zero byte.
C strings aren't all that user-friendly as you probably know.
Writing a zero byte somewhere in the string will truncate it.
Going out of bounds generally ends bad.
Never, ever use strcpy, strcmp, strcat, etc.., instead use their safe variants: strncmp, strncat, strndup,...
Avoid strncpy. strncpy will not always zero delimit your string! If the source string doesn't fit in the destination buffer it truncates the string but it won't write a nul byte at the end of the buffer. Also, even if the source buffer is a lot smaller than the destination, strncpy will still overwrite the whole buffer with zeroes. I personally use strlcpy.
Don't use printf(string), instead use printf("%s", string). Try thinking of the consequences if the user puts a %d in the string.
You can't compare strings with if( s1 == s2 )
doStuff(s1);
You have to compare every character in the string. Use strcmp or better strncmp.
if( strncmp( s1, s2, BUFFER_SIZE ) == 0 )
doStuff(s1);
Abusing strlen() will dramatically worsen the performance.
for( int i = 0; i < strlen( string ); i++ ) {
processChar( string[i] );
}
will have at least O(n2) time complexity whereas
int length = strlen( string );
for( int i = 0; i < length; i++ ) {
processChar( string[i] );
}
will have at least O(n) time complexity. This is not so obvious for people who haven't taken time to think of it.
The following functions can be used to implement a non-mutating strtok:
strcspn(string, delimiters)
strspn(string, delimiters)
The first one finds the first character in the set of delimiters you pass in. The second one finds the first character not in the set of delimiters you pass in.
I prefer these to strpbrk as they return the length of the string if they can't match.
str[0] is equivalent to 0[str], or more generally str[i] is i[str] and i[str] is *(str + i).
NB
this is not specific to strings but it works also for C arrays
The strn* variants in stdlib do not necessarily null terminate the destination string.
As an example: from MSDN's documentation on strncpy:
The strncpy function copies the
initial count characters of strSource
to strDest and returns strDest. If
count is less than or equal to the
length of strSource, a null character
is not appended automatically to the
copied string. If count is greater
than the length of strSource, the
destination string is padded with null
characters up to length count.
confuse strlen() with sizeof() when using a string:
char *p = "hello!!";
strlen(p) != sizeof(p)
sizeof(p) yield, at compile time, the size of the pointer (4 or 8 bytes) whereas strlen(p) counts, at runtime, the lenght of the null terminated char array (7 in this example).
strtok is not thread safe, since it uses a mutable private buffer to store data between calls; you cannot interleave or annidate strtok calls also.
A more useful alternative is strtok_r, use it whenever you can.
kmm has already a good list. Here are the things I had problems with when I started to code C.
String literals have an own memory section and are always accessible. Hence they can for example be a return value of function.
Memory management of strings, in particular with a high level library (not libc). Who is responsible to free the string if it is returned by function or passed to a function?
When should "const char *" and when "char *" be used. And what does it tell me if a function returns a "const char *".
All these questions are not too difficult to learn, but hard to figure out if you don't get taught them.
I have found that the char buff[0] technique has been incredibly useful.
Consider:
struct foo {
int x;
char * payload;
};
vs
struct foo {
int x;
char payload[0];
};
see https://stackoverflow.com/questions/295027
See the link for implications and variations
I'd point out the performance pitfalls of over-reliance on the built-in string functions.
char* triple(char* source)
{
int n=strlen(source);
char* dest=malloc(n*3+1);
strcpy(dest,src);
strcat(dest,src);
strcat(dest,src);
return dest;
}
I would discuss when and when not to use strcpy and strncpy and what can go wrong:
char *strncpy(char* destination, const char* source, size_t n);
char *strcpy(char* destination, const char* source );
I would also mention return values of the ansi C stdlib string functions. For example ask "does this if statement pass or fail?"
if (stricmp("StrInG 1", "string 1")==0)
{
.
.
.
}
perhaps you could illustrate the value of sentinel '\0' with following example
char* a = "hello \0 world";
char b[100];
strcpy(b,a);
printf(b);
I once had my fingers burnt when in my zeal I used strcpy() to copy binary data. It worked most of the time but failed mysteriously sometimes. Mystery was revealed when I realized that binary input sometimes contained a zero byte and strcpy() would terminate there.
You could mention indexed addressing.
An elements address is the base address + index * sizeof element
A common error is:
char *p;
snprintf(p, 3, "%d", 42);
it works until you use up to sizeof(p) bytes.. then funny things happens (welcome to the jungle).
Explaination
with char *p you are allocating space for holding a pointer (sizeof(void*) bytes) on the stack. The right thing here is to allocate a buffer or just to specify the size of the pointer at compile time:
char buf[12];
char *p = buf;
snprintf(p, sizeof(buf), "%d", 42);
Pointers and arrays, while having the similar syntax, are not at all the same. Given:
char a[100];
char *p = a;
For the array, a, there is no pointer stored anywhere. sizeof(a) != sizeof(p), for the array it is the size of the block of memory, for the pointer it is the size of the pointer. This become important if you use something like: sizeof(a)/sizeof(a[0]). Also, you can't ++a, and you can make the pointer a 'const' pointer to 'const' chars, but the array can only be 'const' chars, in which case you'd be init it first. etc etc etc
If possible, use strlcpy (instead of strncpy) and strlcat.
Even better, to make life a bit safer, you can use a macro such as:
#define strlcpy_sz(dst, src) (strlcpy(dst, src, sizeof(dst)))