my version of strlcpy - c

gcc 4.4.4 c89
My program does a lot of string coping. I don't want to use the strncpy as it doesn't nul terminate. And I can't use strlcpy as its not portable.
Just a few questions. How can I put my function through its paces to ensure that it is completely safe and stable. Unit testing?
Is this good enough for production?
size_t s_strlcpy(char *dest, const char *src, const size_t len)
{
size_t i = 0;
/* Always copy 1 less then the destination to make room for the nul */
for(i = 0; i < len - 1; i++)
{
/* only copy up to the first nul is reached */
if(*src != '\0') {
*dest++ = *src++;
}
else {
break;
}
}
/* nul terminate the string */
*dest = '\0';
/* Return the number of bytes copied */
return i;
}
Many thanks for any suggestions,

Although you could simply use another strlcpy function as another post recommends, or use snprintf(dest, len, "%s", src) (which always terminates the buffer), here are the things I noticed looking at your code:
size_t s_strlcpy(char *dest, const char *src, const size_t len)
{
size_t i = 0;
No need to make len const here, but it can be helpful since it checks to make sure you didn't modify it.
/* Always copy 1 less then the destination to make room for the nul */
for(i = 0; i < len - 1; i++)
{
Oops. What if len is 0? size_t is usually unsigned, so (size_t)0 - 1 will end up becoming something like 4294967295, causing your routine to careen through your program's memory and crash into an unmapped page.
/* only copy up to the first nul is reached */
if(*src != '\0') {
*dest++ = *src++;
}
else {
break;
}
}
/* nul terminate the string */
*dest = '\0';
The above code looks fine to me.
/* Return the number of bytes copied */
return i;
}
According to Wikipedia, strlcpy returns strlen(src) (the actual length of the string), not the number of bytes copied. Hence, you need to keep counting the characters in src until you hit '\0', even if it exceeds len.
Also, if your for loop terminates on the len - 1 condition, your function will return len-1, not len like you'd expect it to.
When I write functions like this, I usually prefer to use a start pointer (call it S) and end pointer (call it E). S points to the first character, while E points to one character after the last character (which makes it so E - S is the length of the string). Although this technique may seem ugly and obscure, I've found it to be fairly robust.
Here's an over-commented version of how I would write strlcpy:
size_t s_strlcpy(char *dest, const char *src, size_t len)
{
char *d = dest;
char *e = dest + len; /* end of destination buffer */
const char *s = src;
/* Insert characters into the destination buffer
until we reach the end of the source string
or the end of the destination buffer, whichever
comes first. */
while (*s != '\0' && d < e)
*d++ = *s++;
/* Terminate the destination buffer, being wary of the fact
that len might be zero. */
if (d < e) // If the destination buffer still has room.
*d = 0;
else if (len > 0) // We ran out of room, so zero out the last char
// (if the destination buffer has any items at all).
d[-1] = 0;
/* Advance to the end of the source string. */
while (*s != '\0')
s++;
/* Return the number of characters
between *src and *s,
including *src but not including *s .
This is the length of the source string. */
return s - src;
}

IMHO, just barrow the original strlcpy, which Ignacio Vazquez-Abram tersely stated. OpenBSDs code is battletested and the license terms rock ;).
As to your code, something I would add to what has already been said by others, is just a matter of personal taste:
/* only copy up to the first nul is reached */
if(*src != '\0') {
*dest++ = *src++;
}
else {
break;
}
I would have written this like this:
if(*src == '\0') {
break;
}
*dest++ = *src++;
Both because it reduces on the amount of unnecessary code people need to read, and because it is my 'style' to consistently write that way, instead of if (ok) { do } else { handle error }. The comment above the if is also redundant (see comment above the for loop).

Why don't you use something like memccpy() instead of rolling your own? You just have to terminate with a null byte, but it's easier and generally faster to augment a standard function than to start from scratch.
Some architectures will heavily optimize or even use assembly for string functions to squeeze good performance out of them.
Without building or debugging:
str = memccpy (dest, src, '\0', len);
if(str)
*str = '\0';

Yes, unit testing. Check with lots of randomly generated strings.
Looks fine to me, though.

I would suggest that White-box testing is useful for a function like this (a form of unit testing).

There is the DRY principle "don't repeat yourself". In other words do not create new code to do something that is already a done deal - check in the standard C library as the example above (WhilrWind) shows.
One reason is the testing mentioned. The standard C library has been tested for years, so it is a safe bet it works as advertised.
Learning by playing around with code is a great idea, keep on trying.

Unit testing?
Is this good enough for production?
Probably for a "simple" function like this it may be sufficient, although the only real way to test a function is to try and break it.
Pass to it NULL pointers, 10k character long strings, negative values of len, somehow corrupted data, and so on. In general think: if you were a malicious user trying to break it, how would you do that?
See the link in my response here

I think it's a mistake to rely so much on the length, and doing arithmetic on it.
The size_t type is unsigned. Consider how your function will behave if called with a 0-sized destination.

There's always static code analysis.
Edit: List of tools for static code analysis

Hmmm, did not realize this is an old post.
Is this good enough for production?
completely safe and stable (?)
Weaknesses:
Does not handle len == 0 correctly - easy to fix.
Return value questionable when source is long - easy to fix.
(Not discussed yet) Does not consider overlapping dest, src.
It is easy enough to incur an unexpected result with if(*src != '\0') { *dest++ = *src++; } overwriting the null chracter before it is read, thus iteration ventures beyond the original '\0'.
// pathological example
char buf[16] = "abc";
const char *src = buf; // "abc"
const char *dest = buf + 2; // "c"
size_t dest_sz = sizeof buf - 2;
s_strlcpy(dest, src, dest_sz);
puts(dest); // "ababababababa", usual expectation "abc"
Two solutions forward:
restrict
Since C99, C has restrict which indicates to the compiler that it can assume the data read via src and written via dest will not overlap. This allows the compiler to use certain optimizations it otherwise cannot use. restrict also informs the user should not provide over-lapping buffers.
Code can still fail as above, but then that is the caller breaking the contract, not s_strlcpy().
Notes: const in const size_t len is a distraction in the function declaration. Also clearer to use size_t size, than size_t len.
size_t s_strlcpy(char * restrict dest, const char * restrict src, size_t size);
This restrict usage is like the standard library strcpy() and others.
char *strcpy(char * restrict s1, const char * restrict s2);
Handle overlap
The other is to make s_strlcpy() tolerant of overlapping memory as below. That pretty much implies code needs to use memmove().
size_t s_strlcpy(char *dest, const char *src, const size_t dest_size) {
size_t src_len = strlen(src);
if (src_len < dest_size) {
memmove(dest, src, src_len + 1); // handles overlap without UB
} else if (dest_size > 0) {
// Not enough room
memmove(dest, src, dest_size - 1); // handles overlap without UB
dest[dest_size - 1] = '\0';
}
return src_len; // I do not think OP's return value is correct. S/B src length.
}
Hopefully I coded all the functionality of strlcpy() correctly. The edge cases take time to sort out.

Related

Why is this use of strcpy considered bad?

I've spotted the following piece of C code, marked as BAD (aka buffer overflow bad).
The problem is I don't quite get why? The input string length is captured before the allocation etc.
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
strcpy(c, s); // BAD
}
return c;
}
Update from comments:
the 'BAD' marker is not precise, the code is not bad, not efficient yes, risky (below) yes,
why risky? +1 after the strlen() call is required to safely allocate the space on heap that also will keep the string terminator ('\0')
There is no bug in your sample function.
However, to make it obvious to future readers (both human and mechanical) that there is no bug, you should replace the strcpy call with a memcpy:
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
memcpy(c, s, len);
}
return c;
}
Either way, len bytes are allocated and len bytes are copied, but with memcpy that fact stands out much more clearly to the reader.
There's no problem with this code.
While it's possible that strcpy can cause undefined behavior if the destination buffer isn't large enough to hold the string in question, the buffer is allocated to be the correct size. This means there is no risk of overrunning the buffer.
You may see some guides recommend using strncpy instead, which allows you to specify the maximum number of characters to copy, but this has its own problems. If the source string is too long, only the specified number of characters will be copied, however this also means that the string isn't null terminated which requires the user to do so manually. For example:
char src[] = "test data";
char dest[5];
strncpy(dest, src, sizeof dest); // dest holds "test " with no null terminator
dest[sizeof(dest) - 1] = 0; // manually null terminate, dest holds "test"
I tend towards the use of strcpy if I know the source string will fit, otherwise I'll use strncpy and manually null-terminate.
I cannot see any problem with the code when it comes to the use of strcpy
But you should be aware that it requires s to be a valid C string. That is a reasonable requirement, but it should be specified.
If you want, you could put in a simple check for NULL, but I would say that it's ok to do without it. If you're about to make a copy of a "string" pointed to by a null pointer, then you probably should check either the argument or the result. But if you want, just add this as the first line:
if(!s) return NULL;
But as I said, it does not add much. It just makes it possible to change
if(!str) {
// Handle error
} else {
new_str = my_strdup(str);
}
to:
new_str = my_strdup(str);
if(!new_str) {
// Handle error
}
Not really a huge gain

strlcpy in terms of strncpy on platforms that don't have strlcpy

I know, strlcpy is safer than strncpy while copying from source to destination character arrays where we want destination to be null terminated,
Is the following wrapper ok?
size_t strlcpy(char *dst, const char *src, size_t size) {
if (size != 0) {
int maxSize = size - 1;
int currSize = -1;
while ((++currSize < maxSize) && (*dst++ = *src++));
*dst = 0;
return currSize;
}
return 0;
}
Please comment.
The check for size to be above zero is misleading, because size_t is unsigned. A more readable check is if (size != 0) ...
Another problem is that strncpy pads its destination with zeros up to s, which strlcpy does not do. If you would like to match the behavior of strlcpy, write your own implementation on systems where it is not available, rather than relying on strncpy.
Your implementations is very fast, but with two quirks:
One quirk is that your function writes two NUL-characters instead of one when there is enough space.
Another quirk is the return value which doesn't indicate issues.
Apart from these quircks, your version is functionally equivalent to strxcpy() by attractivechaos. However, your code is 30-100% faster depending on which machine I use. Good job in that respect!
Regarding the return value, here are the differences I observe:
the original strlcpy returns length of src:
cons: unsafe, unnecessarily slow
pros: strlcpy(d,s,n) is equivalent to snprintf(d,n,"%s",s)
strxcpy by attractivechaos returns number of bytes written:
pros: return value doesn't clearly indicate an issue when src is too long
cons: deviates from strlcpy in return value
your function returns length of the string length written:
cons:
return value doesn't clearly indicate an issue when src is too long
return value doesn't indicate whether NUL-character was written or not
pros: more in line with the original strlcpy.
In my preferred implementation all the mentioned functional disadvantages are fixed:
ssize_t safe_strlcpy(char *dst, const char *src, size_t size)
{
if (size == 0)
return -1;
size_t ret = strnlen(src, size);
size_t len = (ret >= size) ? size - 1 : ret;
memcpy(dst, src, len);
dst[len] = '\0';
return ret;
}
It returns size when src was too long and didn't fit entirely; -1 if size is zero; otherwise - string length written. So, when everything is OK, the return value is still in line with strlcpy. And it's based on Git's implementation of the original strlcpy.

C: One error In strcpy function and cannot find it

I know that this strcpy function below is incorrect, but I cannot seem to figure out the one or two things I need to change to fix it. Any help would be greatly appreciated as I am completely stuck on this:
void strcpy(char *destString, char *sourceString)
{
char *marker, *t;
for(marker = sourceString, t = destString; *marker; marker++, t++)
*t = *marker;
}
Well it depends on your environment..
For example I see a few things I don't like:
You do not check for input parameters to be != NULL. This will cause a *0 access
I see you are not terminating your string with the '\0' character (or 0).. So, after the loop (please intent.) add *t = 0;
strcpy() is a predefined function and you are trying to create your own strcpy function. so, when you compile your program, you are getting conflicting types error. So, first rename your function name.
If you want to implement your own strcpy(), then i would suggest to implement strncpy(). It will copy at-most n-1 bytes from source null-terminated character array to destination character array and also add null character at the end of the destination character array.
void strcpy(char *dest, const char *src, size_t n)
{
if ((dest == NULL) || (src == NULL))
return;
int i;
for(i=0; i<(n-1) && src[i]; i++)
dest[i] = src[i];
dest[i]='\0';
}
It wouldn't let buffer overflow.
Note - My implementation is different from standard library strncpy() implementation. The standard library function strncpy() copies at most n bytes of src. If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
I know that this strcpy function below is incorrect, but I cannot seem to figure out the one or two things I need to change to fix it.
You only need to add null character at the end of destination array.
void strcpy(char *destString, char *sourceString)
{
char *marker, *t;
for(marker = sourceString, t = destString; *marker; marker++, t++)
*t = *marker;
*t='\0';
}
This is a very simple aproach:
void copy(char * src, char * dst){
while(*src != '\0'){
*dst = *src;
src++;
dst++;
}
*dst = '\0';
}
int main(int argc, char** argv){
char src [] = "hello";
char dst [] = "----";
copy(src, dst);
printf("src: %s\n", src);
printf("dst: %s\n", dst );
}
It's more or less like wildplasser comment. First you iterate over the src pointer. In c, if you have '\0' (in a well formed string) then you can exit because it is the final character. Ok, you iterate over the src pointer and assign the value of src (*src) to the value of dst (*dst) and then you only have to increase both pointers...
It's all
A very easy strcpy function would be:
int strcpy(char *dest,char *source)
{
if (source==NULL)
{
printf("The source pointer is NULL");
return 0;
}
if (dest==NULL)
{
dest=(char*)malloc((strlen(source)+1)*sizeof(char));
}
int i;
for (i=0;source[i]!='\0';i++)
{
dest[i]=source[i];
}
dest[i]='\0';
return 1;
}
You should have no problem copying strings this way. Always use indexes instead of pointer operations, it's easier imo.
If you use an IDE, you should learn to use the debug function to discover the errors and problems, usually when you deal with strings one of the most common RUNTIME problems is the lack of a '\0', which automatically make your string functions go to memory zones where they shouldn`t be.

Implementation of strcpy and strcat that deals with exceptions

I have to write strcpy() and strcat() in 7 lines of code and deal with any exceptions there could be. This is my code so far. Does anyone have any suggestions on how I can reduce the number of lines?
char *mystrcpy(char *dst, const char *src)
{
char *ptr;
ptr = dst;
while(*dst++=*src++);
return(ptr);
}
void strcat(char *dest, const char *src)
{
while (*dest!= '\0')
*dest++ ;
do
{
*dest++ = *src++;
}
while (src != '\0') ;
}
char *mystrcpy(char *dst, const char *src){
return (*dst=*src) ? (mystrcpy(dst+1, src+1), dst) : dst;
}
char *mystrcat(char *dst, const char *src){
return (*dst ? mystrcat(dst+1, src) : mystrcpy(dst, src)), dst;
}
You have a problem with your code: you are testing src itself against '\0', and not the char it is pointing to.
while (src != '\0');
should be
while (*src != '\0');
First get it right, then get it short / fast.
You are writing about being exception safe all over this page. Since you are using C, there is no native language concept like exceptions (as opposed to C++).
Anyway, the only type of exceptions that your code could raise are hardware exceptions (page fault, stack fault, alignment check, ...) which can't be caught by a normal C++ try-catch. Since the handling of hardware exceptions is platform dependent, you would need to use a platform specific mechanism to catch them (like SEH for Windows, or a signal handler for Unix). This approach is nothing I would recommended, read on.
Much better than catching a hardware exception is to try as hard as you can to prevent it. In your code this would just mean testing the input pointers for != 0. Note that you have no chance to identify an invalid pointer like char* dst = 0xDEADBEEF; and the only way to handle the exception after it was accessed would be platform dependent code like mentioned above. But hard errors like this typically shouldn't be handled by your program at all.
Example:
// Returns the new string length, or -1 on error.
int strcat(char* dst, char const* src)
{
// Check arguments before doing anything.
if (dst == 0)
return -1;
if (src == 0)
return -1;
// Store begin of destination to compute length later on.
char const* const dstBegin = dst;
// Skip to end of destination.
while (*dst != '\0')
*dst++ ;
// Copy source to destination.
do
{
*dst++ = *src++;
}
while (*src != '\0');
// Return new length of string (excluding terminating '\0').
return (dst - dstBegin - 1);
}
Optionally you could introduce a dstSize parameter which would indicate the size of the destination buffer, so that you could effectively detect and prevent buffer overflows.

Interview Question-Concatenate two Strings without using strcat in C

Recently I attended an interview where they asked me to write a C program to concatenate two strings without using strcat(), strlen() and strcmp() and that function should not exceed two (2) lines.
I know how to concatenate two strings without using strcat(). But my method has nearly 15 lines. I dont know how to write it in two lines.
I expect they wanted something like this:
void mystrcat(char * dest, const char * src)
{
//advance dest until we find the terminating null
while (*dest) ++dest;
//copy src to dest including terminating null, until we hit the end of src
//Edit: originally this:
//for (; *dest = *src, *src; ++dest, ++src);
//...which is the same as this
for (; *dest = *src; ++dest, ++src);
}
It doesn't return the end of the concatenated string like the real strcat, but that doesn't seem to be required.
I don't necessarily know if this sort of thing is a good interview question - it shows that you can code tersely, and that you know what strcat does, but that's about it.
Edit: as aib writes, the statement
while (*dest++ = *src++);
...is perhaps a more conventional way of writing the second loop (instead of using for).
Given that the task was to concatenate two strings, not to create a duplicate of strcat, I'd go with the simple option of creating a completely new string that is a combination of the two.
char buffer[REASONABLE_MAX] = {0};
snprintf(buffer, REASONABLE_MAX - 1, "%s%s", string1, string2);
The proper answer to that question is that the question would demonstrate a skill that it is bad to have. They are wanting you to demonstrate the ability to write hacker code. They are wanting you to invent your own implementation of things provided already by every C compiler, which is waste of time. They are wanting you to write streamlined code which, by definition, is not readable. The 15 line implementation is probably better if it is more readable. Most projects do not fail because the developers wasted 150 clock cycles. Some do fail because someone wrote unmaintainable code. If you did have to write that, it would need a 15 line comment. So my answer to that would be, show me the performance metrics that defend needing to not use the standard libraries and requiring the most optimal solution. Time is much better spent on design and gathering those performance metrics.
Never forget - you are also interviewing them.
//assuming szA contains "first string" and szB contains "second string"
//and both are null terminated
// iterate over A until you get to null, then iterate over B and add to the end of A
// and then add null termination to A
// WARNING: memory corruption likely if either string is not NULL terminated
// WARNING: memory corruption likely if the storage buffer for A was not allocated large
// enough for A to store all of B's data
// Justification: Performance metric XXX has shown this optimization is needed
for(int i=0; szA[i]!='\0'; i++);
for(int j=0; (j==0)||(szB[j-1]!='\0'); j++) szA[i+j] = szB[j];
*edit, 9/27/2010
After reading some other solutions to this, I think the following is probably the best code answer:
//Posted by Doug in answer below this one
void my_strcat(char * dest, const char * src)
{
while (*dest) ++dest;
while (*dest++ = *src++);
}
But I would follow that up with a safe version of that:
void my_safe_strcat(char * dest, const unsigned int max_size, const char * src)
{
int characters_used=0;
while (*dest) { ++dest; characters_used++; }
while ( (characters_used < (max_size-1) ) && (*dest++ = *src++) ) characters_used++;
*dest = 0; //ensure we end with a null
}
And follow that up with (full answer, which compiler will optimize to be the same as above, along with application which was the real question):
void my_readable_safe_strcat(char * dest, const unsigned int max_size, const char * src)
{
unsigned int characters_used = 0;
while (*dest != '\0')
{
++dest;
characters_used++;
}
while ( (characters_used < (max_size-1) ) && (*dest = *src) )
{
dest++;
src++;
characters_used++;
}
*dest = 0; //ensure we end with a null
}
int _tmain(int argc, _TCHAR* argv[])
{
char szTooShort[15] = "First String";
char szLongEnough[50] = "First String";
char szClean[] = "Second String";
char szDirty[5] = {'f','g','h','i','j'};
my_readable_safe_strcat(szTooShort,15,szClean);
printf("This string should be cut off:\n%s\n\n",szTooShort);
my_readable_safe_strcat(szLongEnough,50,szClean);
printf("This string should be complete:\n%s\n\n",szLongEnough);
my_readable_safe_strcat(szLongEnough,50,szDirty);
printf("This string probably has junk data in it, but shouldn't crash the app:\n%s\n\n",szLongEnough);
}
Two lines? Bwah...
void another_strcat(char* str1, const char* str2)
{
strcpy(strchr(str1, '\0'), str2);
}
EDIT: I'm very upset that people are so against strcpy and strchr. Waah! So, I thought I'd play by the spirit of the rules:
char thing(char* p, const char* s)
{
return *p ? thing(&p[1], s) : *s ? (*p++ = *s++, thing(p, s)) : *p = '\0';
}
I still can't understand how anyone would take 2 whole lines ;-P.
I tested this bit in VS2008, and it worked fine.
void NewStrCat(char* dest, const char* src)
{
while (*dest) ++dest;
while (*dest++ = *src++);
}
Any function can be made to fit in a single line by simply removing all the \n.
However, I think you're looking for this answer:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
char string1[32] = "Hello";
char string2[] = ", World!";
char *dst = string1 + strlen(string1);
char *src = string2;
while (*dst++ = *src++); //single statement
printf("\"%s\"\n", string1);
return EXIT_SUCCESS;
}
The explanation is rather simple:
src++ returns a pointer to the current character being copied before incrementing to point to the next one. * dereferences this pointer, and a similar expression on the LHS copies it to dst. Result of the whole = expression is the character that was copied, hence a simple while loops it until a \0 is encountered and copied.
However:
strcat() is easier to read and possibly much faster. Any other solution is feasible only when strcat() is not available. (Or when you're in an interview, apparently.)
And replace strcat() above with strncat() unless you're really really sure the destination string is big enough.
Edit: I missed the part about strlen() being disallowed. Here's the two-statement function:
void my_strcat(char * restrict dst, const char * restrict src)
{
while (*dst) ++dst; //move dst to the end of the string
while (*dst++ = *src++); //copy src to dst
}
Note that the standard strcat() function returns the original value of dst.
One line:
sprintf(string1, "%s%s", string1, string2);
(Note that this might possibly invoke undefined behavior.)
Addendum
The ISO C99 standard states that:
If copying takes place between objects that overlap, the behavior is undefined.
That being said, the code above will still probably work correctly. It works with MS VC 2010.
I have a feeling such questions are meant to be elimination questions rather than selection. It is easier to eliminate candidates for them based on such convoluted questions rather than select candidates by asking them more real world questions.
Just a rant from me, since I am also looking for a job and facing such questions and answered quite a few of them thanks to SO!
void StringCatenation(char *str1,char *str2)
{
int len1,i=0;
for(len1=0;*(str1+len1);len1++);
do{
str1[len1+i]=str2[i];
i++;
}
while(*(str2+i);
}
void my_strcat(char* dest, const char* src)
{
while (*dest) ++dest;
while (*dest++ = *src++);
*dest = '\0';
}
Destination string must end with NULL terminated.

Resources