I've spotted the following piece of C code, marked as BAD (aka buffer overflow bad).
The problem is I don't quite get why? The input string length is captured before the allocation etc.
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
strcpy(c, s); // BAD
}
return c;
}
Update from comments:
the 'BAD' marker is not precise, the code is not bad, not efficient yes, risky (below) yes,
why risky? +1 after the strlen() call is required to safely allocate the space on heap that also will keep the string terminator ('\0')
There is no bug in your sample function.
However, to make it obvious to future readers (both human and mechanical) that there is no bug, you should replace the strcpy call with a memcpy:
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
memcpy(c, s, len);
}
return c;
}
Either way, len bytes are allocated and len bytes are copied, but with memcpy that fact stands out much more clearly to the reader.
There's no problem with this code.
While it's possible that strcpy can cause undefined behavior if the destination buffer isn't large enough to hold the string in question, the buffer is allocated to be the correct size. This means there is no risk of overrunning the buffer.
You may see some guides recommend using strncpy instead, which allows you to specify the maximum number of characters to copy, but this has its own problems. If the source string is too long, only the specified number of characters will be copied, however this also means that the string isn't null terminated which requires the user to do so manually. For example:
char src[] = "test data";
char dest[5];
strncpy(dest, src, sizeof dest); // dest holds "test " with no null terminator
dest[sizeof(dest) - 1] = 0; // manually null terminate, dest holds "test"
I tend towards the use of strcpy if I know the source string will fit, otherwise I'll use strncpy and manually null-terminate.
I cannot see any problem with the code when it comes to the use of strcpy
But you should be aware that it requires s to be a valid C string. That is a reasonable requirement, but it should be specified.
If you want, you could put in a simple check for NULL, but I would say that it's ok to do without it. If you're about to make a copy of a "string" pointed to by a null pointer, then you probably should check either the argument or the result. But if you want, just add this as the first line:
if(!s) return NULL;
But as I said, it does not add much. It just makes it possible to change
if(!str) {
// Handle error
} else {
new_str = my_strdup(str);
}
to:
new_str = my_strdup(str);
if(!new_str) {
// Handle error
}
Not really a huge gain
Related
I have the following code in C now
int length = 50
char *target_str = (char*) malloc(length);
char *source_str = read_string_from_somewhere() // read a string from somewhere
// with length, say 20
memcpy(target_str, source_str, length);
The scenario is that target_str is initialized with 50 bytes. source_str is a string of length 20.
If I want to copy the source_str to target_str i use memcpy() as above with length 50, which is the size of target_str. The reason I use length in memcpy is that, the source_str can have a max value of length but is usually less than that (in the above example its 20).
Now, if I want to copy till length of source_str based on its terminating character ('\0'), even if memcpy length is more than the index of terminating character, is the above code a right way to do it? or is there an alternative suggestion.
Thanks for any help.
The scenario is that target_str is initialized with 50 bytes. source_str is a string of length 20.
If I want to copy the source_str to target_str i use memcpy() as above with length 50, which is the size of target_str.
currently you ask for memcpy to read 30 characters after the end of the source string because it does not care of a possible null terminator on the source, this is an undefined behavior
because you copy a string you can use strcpy rather than memcpy
but the problem of size can be reversed, I mean the target can be smaller than the source, and without protection you will have again a undefined behavior
so you can use strncpy giving the length of the target, just take care of the necessity to add a final null character in case the target is smaller than the source :
int length = 50
char *target_str = (char*) malloc(length);
char *source_str = read_string_from_somewhere(); // length unknown
strncpy(target_str, source_str, length - 1); // -1 to let place for \0
target_str[length - 1] = 0; // force the presence of a null character at end in case
If I want to copy the source_str to target_str i use memcpy() as above
with length 50, which is the size of target_str. The reason I use
length in memcpy is that, the source_str can have a max value of
length but is usually less than that (in the above example its 20).
It is crucially important to distinguish between
the size of the array to which source_str points, and
the length of the string, if any, to which source_str points (+/- the terminator).
If source_str is certain to point to an array of length 50 or more then the memcpy() approach you present is ok. If not, then it produces undefined behavior when source_str in fact points to a shorter array. Any result within the power of your C implementation may occur.
If source_str is certain to point to a (properly-terminated) C string of no more than length - 1 characters, and if it is its string value that you want to copy, then strcpy() is more natural than memcpy(). It will copy all the string contents, up to and including the terminator. This presents no problem when source_str points to an array shorter than length, so long as it contains a string terminator.
If neither of those cases is certain to hold, then it's not clear what you want to do. The strncpy() function may cover some of those cases, but it does not cover all of them.
Now, if I want to copy till length of source_str based on its terminating character ('\0'), even if memcpy length is more than the index of terminating character, is the above code a right way to do it?
No; you'd be copying the entire content of source_str, even past the null-terminator if it occurs before the end of the allocated space for the string it is pointing to.
If your concern is minimizing the auxiliary space used by your program, what you could do is use strlen to determine the length of source_str, and allocate target_str based on that. Also, strcpy is similar to memcpy but is specifically intended for null-terminated strings (observe that it has no "size" or "length" parameter):
char *target_str = NULL;
char *source_str = read_string_from_somewhere();
size_t len = strlen(source_str);
target_str = malloc(len + 1);
strcpy(target_str, source_str);
// ...
free(target_str);
target_str = NULL;
memcpy is used to copy fixed blocks of memory, so if you want to copy something shorter that is terminated by '\n' you don't want to use memcpy.
There is other functions like strncpy or strlcpy that do similar things.
Best to check what the implementations do. I removed the optimized versions from the original source code for the sake of readability.
This is an example memcpy implementation: https://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c
void *memcpy(void *restrict dest, const void *restrict src, size_t n)
{
unsigned char *d = dest;
const unsigned char *s = src;
for (; n; n--) *d++ = *s++;
return dest;
}
It's clear that here, both pieces of memory are visited for n times. regardless of the size of source or destination string, which causes copying of memory past your string if it was shorter. Which is bad and can cause various unwanted behavior.
this is strlcpy from: https://git.musl-libc.org/cgit/musl/tree/src/string/strlcpy.c
size_t strlcpy(char *d, const char *s, size_t n)
{
char *d0 = d;
size_t *wd;
if (!n--) goto finish;
for (; n && (*d=*s); n--, s++, d++);
*d = 0;
finish:
return d-d0 + strlen(s);
}
The trick here is that n && (*d = 0) evaluates to false and will break the looping condition and exit early.
Hence this gives you the wanted behaviour.
Use strlen to determine the exact size of source_string and allocate accordingly, remembering to add an extra byte for the null terminator. Here's a full example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *source_str = "string_read_from_somewhere";
int len = strlen(source_str);
char *target_str = malloc(len + 1);
if (!target_str) {
fprintf(stderr, "%s:%d: malloc failed", __FILE__, __LINE__);
return 1;
}
memcpy(target_str, source_str, len + 1);
puts(target_str);
free(target_str);
return 0;
}
Also, there's no need to cast the result of malloc. Don't forget to free the allocated memory.
As mentioned in the comments, you probably want to restrict the size of the malloced string to a sensible amount.
I have a string function that accepts a pointer to a source string and returns a pointer to a destination string. This function currently works, but I'm worried I'm not following the best practice regrading malloc, realloc, and free.
The thing that's different about my function is that the length of the destination string is not the same as the source string, so realloc() has to be called inside my function. I know from looking at the docs...
http://www.cplusplus.com/reference/cstdlib/realloc/
that the memory address might change after the realloc. This means I have can't "pass by reference" like a C programmer might for other functions, I have to return the new pointer.
So the prototype for my function is:
//decode a uri encoded string
char *net_uri_to_text(char *);
I don't like the way I'm doing it because I have to free the pointer after running the function:
char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc");
printf("%s\n", chr_output); //testing123Z[\abc
free(chr_output);
Which means that malloc() and realloc() are called inside my function and free() is called outside my function.
I have a background in high level languages, (perl, plpgsql, bash) so my instinct is proper encapsulation of such things, but that might not be the best practice in C.
The question: Is my way best practice, or is there a better way I should follow?
full example
Compiles and runs with two warnings on unused argc and argv arguments, you can safely ignore those two warnings.
example.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *net_uri_to_text(char *);
int main(int argc, char ** argv) {
char * chr_input = "testing123%5a%5b%5cabc";
char * chr_output = net_uri_to_text(chr_input);
printf("%s\n", chr_output);
free(chr_output);
return 0;
}
//decodes uri-encoded string
//send pointer to source string
//return pointer to destination string
//WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK!
char *net_uri_to_text(char * chr_input) {
//define variables
int int_length = strlen(chr_input);
int int_new_length = int_length;
char * chr_output = malloc(int_length);
char * chr_output_working = chr_output;
char * chr_input_working = chr_input;
int int_output_working = 0;
unsigned int uint_hex_working;
//while not a null byte
while(*chr_input_working != '\0') {
//if %
if (*chr_input_working == *"%") {
//then put correct char in
sscanf(chr_input_working + 1, "%02x", &uint_hex_working);
*chr_output_working = (char)uint_hex_working;
//printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working);
//realloc
chr_input_working++;
chr_input_working++;
int_new_length -= 2;
chr_output = realloc(chr_output, int_new_length);
//output working must be the new pointer plys how many chars we've done
chr_output_working = chr_output + int_output_working;
} else {
//put char in
*chr_output_working = *chr_input_working;
}
//increment pointers and number of chars in output working
chr_input_working++;
chr_output_working++;
int_output_working++;
}
//last null byte
*chr_output_working = '\0';
return chr_output;
}
It's perfectly ok to return malloc'd buffers from functions in C, as long as you document the fact that they do. Lots of libraries do that, even though no function in the standard library does.
If you can compute (a not too pessimistic upper bound on) the number of characters that need to be written to the buffer cheaply, you can offer a function that does that and let the user call it.
It's also possible, but much less convenient, to accept a buffer to be filled in; I've seen quite a few libraries that do that like so:
/*
* Decodes uri-encoded string encoded into buf of length len (including NUL).
* Returns the number of characters written. If that number is less than len,
* nothing is written and you should try again with a larger buffer.
*/
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
size_t space_needed = 0;
while (decoding_needs_to_be_done()) {
// decode characters, but only write them to buf
// if it wouldn't overflow;
// increment space_needed regardless
}
return space_needed;
}
Now the caller is responsible for the allocation, and would do something like
size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);
len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
// try again
result = xrealloc(input, result, len);
}
(Here, xmalloc and xrealloc are "safe" allocating functions that I made up to skip NULL checks.)
The thing is that C is low-level enough to force the programmer to get her memory management right. In particular, there's nothing wrong with returning a malloc()ated string. It's a common idiom to return mallocated obejcts and have the caller free() them.
And anyways, if you don't like this approach, you can always take a pointer to the string and modify it from inside the function (after the last use, it will still need to be free()d, though).
One thing, however, that I don't think is necessary is explicitly shrinking the string. If the new string is shorter than the old one, there's obviously enough room for it in the memory chunk of the old string, so you don't need to realloc().
(Apart from the fact that you forgot to allocate one extra byte for the terminating NUL character, of course...)
And, as always, you can just return a different pointer each time the function is called, and you don't even need to call realloc() at all.
If you accept one last piece of good advice: it's advisable to const-qualify your input strings, so the caller can ensure that you don't modify them. Using this approach, you can safely call the function on string literals, for example.
All in all, I'd rewrite your function like this:
char *unescape(const char *s)
{
size_t l = strlen(s);
char *p = malloc(l + 1), *r = p;
while (*s) {
if (*s == '%') {
char buf[3] = { s[1], s[2], 0 };
*p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
s += 3;
} else {
*p++ = *s++;
}
}
*p = 0;
return r;
}
And call it as follows:
int main()
{
const char *in = "testing123%5a%5b%5cabc";
char *out = unescape(in);
printf("%s\n", out);
free(out);
return 0;
}
It's perfectly OK to return newly-malloc-ed (and possibly internally realloced) values from functions, you just need to document that you are doing so (as you do here).
Other obvious items:
Instead of int int_length you might want to use size_t. This is "an unsigned type" (usually unsigned int or unsigned long) that is the appropriate type for lengths of strings and arguments to malloc.
You need to allocate n+1 bytes initially, where n is the length of the string, as strlen does not include the terminating 0 byte.
You should check for malloc failing (returning NULL). If your function will pass the failure on, document that in the function-description comment.
sscanf is pretty heavy-weight for converting the two hex bytes. Not wrong, except that you're not checking whether the conversion succeeds (what if the input is malformed? you can of course decide that this is the caller's problem but in general you might want to handle that). You can use isxdigit from <ctype.h> to check for hexadecimal digits, and/or strtoul to do the conversion.
Rather than doing one realloc for every % conversion, you might want to do a final "shrink realloc" if desirable. Note that if you allocate (say) 50 bytes for a string and find it requires only 49 including the final 0 byte, it may not be worth doing a realloc after all.
I would approach the problem in a slightly different way. Personally, I would split your function in two. The first function to calculate the size you need to malloc. The second would write the output string to the given pointer (which has been allocated outside of the function). That saves several calls to realloc, and will keep the complexity the same. A possible function to find the size of the new string is:
int getNewSize (char *string) {
char *i = string;
int size = 0, percent = 0;
for (i, size; *i != '\0'; i++, size++) {
if (*i == '%')
percent++;
}
return size - percent * 2;
}
However, as mentioned in other answers there is no problem in returning a malloc'ed buffer as long as you document it!
Additionally what was already mentioned in the other postings, you should also document the fact that the string is reallocated. If your code is called with a static string or a string allocated with alloca, you may not reallocate it.
I think you are right to be concerned about splitting up mallocs and frees. As a rule, whatever makes it, owns it and should free it.
In this case, where the strings are relatively small, one good procedure is to make the string buffer larger than any possible string it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10000 characters you can store any possible URL.
Another trick is to store both the length and capacity of the string at its front, so that (int)*mystring == length of string and (int)*(mystring + 4) == capacity of string. Thus, the string itself only starts at the 8th position *(mystring+8). By doing this you can pass around a single pointer to a string and always know how long it is and how much memory capacity the string has. You can make macros that automatically generate these offsets and make "pretty code".
The value of using buffers this way is you do not need to do a reallocation. The new value overwrites the old value and you update the length at the beginning of the string.
At a recent job interview, I was asked to implement my own string copy function. I managed to write code that I believe works to an extent. However, when I returned home to try the problem again, I realized that it was a lot more complex than I had thought. Here is the code I came up with:
#include <stdio.h>
#include <stdlib.h>
char * mycpy(char * d, char * s);
int main() {
int i;
char buffer[1];
mycpy(buffer, "hello world\n");
printf("%s", buffer);
return 0;
}
char * mycpy (char * destination, char * source) {
if (!destination || !source) return NULL;
char * tmp = destination;
while (*destination != NULL || *source != NULL) {
*destination = *source;
destination++;
source++;
}
return tmp;
}
I looked at some other examples online and found that since all strings in C are null-terminated, I should have read up to the null character and then appended a null character to the destination string before exiting.
However one thing I'm curious about is how memory is being handled. I noticed if I used the strcpy() library function, I could copy a string of 10 characters into a char array of size 1. How is this possible? Is the strcpy() function somehow allocating more memory to the destination?
Good interview question has several layers, to which to candidate can demonstrate different levels of understanding.
On the syntactic 'C language' layer, the following code is from the classic Kernighan and Ritchie book ('The C programming language'):
while( *dest++ = *src++ )
;
In an interview, you could indeed point out the function isn't safe, most notably the buffer on *dest isn't large enough. Also, there may be overlap, i.e. if dest points to the middle of the src buffer, you'll have endless loop (which will eventually creates memory access fault).
As the other answers have said, you're overwriting the buffer, so for the sake of your test change it to:
char buffer[ 12 ];
For the job interview they were perhaps hoping for:
char *mycpy( char *s, char *t )
{
while ( *s++ = *t++ )
{
;
}
return s;
}
No, it's that strcpy() isn't safe and is overwriting the memory after it, I think. You're supposed to use strncpy() instead.
No, you're writing past the buffer and overwriting (in this case) the rest of your stack past buffer. This is very dangerous behavior.
In general, you should always create methods that supply limits. In most C libraries, these methods are denoted by an n in the method name.
C does not do any run time bounds checking like other languages(C#,Java etc). That is why you can write things past the end of the array. However, you won't be able to access that string in some cases because you might be encroaching upon memory that doesn't belong to you giving you a segementation fault. K&R would be a good book to learn such concepts.
The strcpy() function forgoes memory management entirely, therefore all allocation needs to be done before the function is called, and freed afterward when necessary. If your source string has more characters than the destination buffer, strcpy() will just keep writing past the end of the buffer into unallocated space, or into space that's allocated for something else.
This can be very bad.
strncpy() works similarly to strcpy(), except that it allows you to pass an additional variable describing the size of the buffer, so the function will stop copying when it reaches this limit. This is safer, but still relies on the calling program to allocate and describe the buffer properly -- it can still go past the end of the buffer if you provide the wrong length, leading to the same problems.
char * mycpy (char * destination, char * source) {
if (!destination || !source) return NULL;
char * tmp = destination;
while (*destination != NULL || *source != NULL) {
*destination = *source;
destination++;
source++;
}
return tmp;
}
In the above copy implementation, your tmp and destination are having the same data. Its better your dont retrun any data, and instead let the destination be your out parameter. Can you rewrite the same.
The version below works for me. I'm not sure if it is bad design though:
while(source[i] != '\0' && (i<= (MAXLINE-1)))
{
dest[i]=source[i];
++i;
}
In general it's always a good idea to have const modifier where it's possible, for example for the source parameter.
I have a string pointer like below,
char *str = "This is cool stuff";
Now, I've references to this string pointer like below,
char* start = str + 1;
char* end = str + 6;
So, start and end are pointing to different locations of *str. How can I copy the string chars falls between start and end into a new string pointer. Any existing C++/C function is preferable.
Just create a new buffer called dest and use strncpy
char dest[end-start+1];
strncpy(dest,start,end-start);
dest[end-start] = '\0'
Use STL std::string:
#include
const char *str = "This is cool stuff";
std::string part( str + 1, str + 6 );
This uses iterator range constructor, so the part of the C-string does not have to be zero-terminated.
It's best to do this with strcpy(), and terminate the result yourself. The standard strncpy() function has very strange semantics.
If you really want a "new string pointer", and be a bit safe with regard to lengths and static buffers, you need to dynamically allocate the new string:
char * ranged_copy(const char *start, const char *end)
{
char *s;
s = malloc(end - start + 1);
memcpy(s, start, end - start);
s[end - start] = 0;
return s;
}
If you want to do this with C++ STL:
#include <string>
...
std::string cppStr (str, 1, 6); // copy substring range from 1st to 6th character of *str
const char *newStr = cppStr.c_str(); // make new char* from substring
char newChar[] = new char[end-start+1]]
p = newChar;
while (start < end)
*p++ = *start++;
This is one of the rare cases when function strncpy can be used. Just calculate the number of characters you need to copy and specify that exact amount in the strncpy. Remember that strncpy will not zero-terminate the result in this case, so you'll have to do it yourself (which, BTW, means that it makes more sense to use memcpy instead of the virtually useless strncpy).
And please, do yourself a favor, start using const char * pointers with string literals.
Assuming that end follows the idiomatic semantics of pointing just past the last item you want copied (STL semantics are a useful idiom even if we're dealing with straight C) and that your destination buffer is known to have enough space:
memcpy( buf, start, end-start);
buf[end-start] = '\0';
I'd wrap this in a sub-string function that also took the destination buffer size as a parameter so it could perform a check and truncate the result or return an error to prevent overruns.
I'd avoid using strncpy() because too many programmers forget about the fact that it might not terminate the destination string, so the second line might be mistakenly dropped at some point by someone believing it unnecessary. That's less likely if memcpy() were used. (In general, just say no to using strncpy())
Can u Give solution for this code of typecasting, LPCTSTR(here lpsubkey) to Char*
for below code snippet ,
char* s="HKEY_CURRENT_USER\\";
strcat(s,(char*)lpSubKey);
printf("%S",s);
here it makes error of access violation ,so what will be the solution for that?.
...thanks in advance
There are several issues with your code that might well lead to the access violation. I don't think any have anything to do with the cast you mentioned.
You are assigning a pointer to the first element of a fixed size char array to a char * and then attempt to append to this using strcat. This is wrong as there is no additional space left in the implicitly allocated string array. You will need to allocate a buffer big enough to hold the resulting string and then copy the string constant in there before calling strcat. For example, like so:
char *s = (char*)malloc(1024 * sizeof(char));
strcpy(s, "HKEY_CURRENT_USER\\");
strcat(s, T2A(lpSubKey));
printf("%s", s);
free(s);
Please note that the fixed size array I'm allocating above is bad practise. In production code you should always determine the correct size of the array on the go to prevent buffer overflows or use functions like strncat and strncpy to ensure that you are not copying more data into the buffer than the buffer can hold.
These are not the same thing. What are you trying to do?
The problem is you are trying to append to a string that you have not reserved memory for.
Try:
char s[1024] = "HKEY_CURRENT_USER";
strcat(s,(char*)lpSubKey );
printf("%S",s);
Do be careful with the arbitrary size of 1024. If you expect your keys to be much longer your program will crash.
Also, look at strcat_s.
ATL and MFC has set of macros to such conversion, where used next letters:
W - wide unicode string
T - generic character string
A - ANSI character string
OLE - BSTR string,
so in your case you need T2A macros
strcat does not attempt to make room for the combination. You are overwriting memory that isn't part of the string. Off the top of my head:
char *strcat_with_alloc(char *s1, char *s2)
{
if (!s1 || !s2) return NULL;
size_t len1 = strlen(s1);
size_t len2 = strlen(s2);
char *dest = (char *)malloc(len1 + len2 + 1);
if (!dest) return NULL;
strcpy(dest, s1);
strcat(dest, s2);
return dest;
}
now try:
char* s="HKEY_CURRENT_USER\\";
char *fullKey = strcat_with_alloc(s,(char*)lpSubKey);
if (!fullKey)
printf("error no memory");
else {
printf("%S",fullKey);
free(fullKey);
}