C - memcpy with char * with length greater than source string length - c

I have the following code in C now
int length = 50
char *target_str = (char*) malloc(length);
char *source_str = read_string_from_somewhere() // read a string from somewhere
// with length, say 20
memcpy(target_str, source_str, length);
The scenario is that target_str is initialized with 50 bytes. source_str is a string of length 20.
If I want to copy the source_str to target_str i use memcpy() as above with length 50, which is the size of target_str. The reason I use length in memcpy is that, the source_str can have a max value of length but is usually less than that (in the above example its 20).
Now, if I want to copy till length of source_str based on its terminating character ('\0'), even if memcpy length is more than the index of terminating character, is the above code a right way to do it? or is there an alternative suggestion.
Thanks for any help.

The scenario is that target_str is initialized with 50 bytes. source_str is a string of length 20.
If I want to copy the source_str to target_str i use memcpy() as above with length 50, which is the size of target_str.
currently you ask for memcpy to read 30 characters after the end of the source string because it does not care of a possible null terminator on the source, this is an undefined behavior
because you copy a string you can use strcpy rather than memcpy
but the problem of size can be reversed, I mean the target can be smaller than the source, and without protection you will have again a undefined behavior
so you can use strncpy giving the length of the target, just take care of the necessity to add a final null character in case the target is smaller than the source :
int length = 50
char *target_str = (char*) malloc(length);
char *source_str = read_string_from_somewhere(); // length unknown
strncpy(target_str, source_str, length - 1); // -1 to let place for \0
target_str[length - 1] = 0; // force the presence of a null character at end in case

If I want to copy the source_str to target_str i use memcpy() as above
with length 50, which is the size of target_str. The reason I use
length in memcpy is that, the source_str can have a max value of
length but is usually less than that (in the above example its 20).
It is crucially important to distinguish between
the size of the array to which source_str points, and
the length of the string, if any, to which source_str points (+/- the terminator).
If source_str is certain to point to an array of length 50 or more then the memcpy() approach you present is ok. If not, then it produces undefined behavior when source_str in fact points to a shorter array. Any result within the power of your C implementation may occur.
If source_str is certain to point to a (properly-terminated) C string of no more than length - 1 characters, and if it is its string value that you want to copy, then strcpy() is more natural than memcpy(). It will copy all the string contents, up to and including the terminator. This presents no problem when source_str points to an array shorter than length, so long as it contains a string terminator.
If neither of those cases is certain to hold, then it's not clear what you want to do. The strncpy() function may cover some of those cases, but it does not cover all of them.

Now, if I want to copy till length of source_str based on its terminating character ('\0'), even if memcpy length is more than the index of terminating character, is the above code a right way to do it?
No; you'd be copying the entire content of source_str, even past the null-terminator if it occurs before the end of the allocated space for the string it is pointing to.
If your concern is minimizing the auxiliary space used by your program, what you could do is use strlen to determine the length of source_str, and allocate target_str based on that. Also, strcpy is similar to memcpy but is specifically intended for null-terminated strings (observe that it has no "size" or "length" parameter):
char *target_str = NULL;
char *source_str = read_string_from_somewhere();
size_t len = strlen(source_str);
target_str = malloc(len + 1);
strcpy(target_str, source_str);
// ...
free(target_str);
target_str = NULL;

memcpy is used to copy fixed blocks of memory, so if you want to copy something shorter that is terminated by '\n' you don't want to use memcpy.
There is other functions like strncpy or strlcpy that do similar things.
Best to check what the implementations do. I removed the optimized versions from the original source code for the sake of readability.
This is an example memcpy implementation: https://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c
void *memcpy(void *restrict dest, const void *restrict src, size_t n)
{
unsigned char *d = dest;
const unsigned char *s = src;
for (; n; n--) *d++ = *s++;
return dest;
}
It's clear that here, both pieces of memory are visited for n times. regardless of the size of source or destination string, which causes copying of memory past your string if it was shorter. Which is bad and can cause various unwanted behavior.
this is strlcpy from: https://git.musl-libc.org/cgit/musl/tree/src/string/strlcpy.c
size_t strlcpy(char *d, const char *s, size_t n)
{
char *d0 = d;
size_t *wd;
if (!n--) goto finish;
for (; n && (*d=*s); n--, s++, d++);
*d = 0;
finish:
return d-d0 + strlen(s);
}
The trick here is that n && (*d = 0) evaluates to false and will break the looping condition and exit early.
Hence this gives you the wanted behaviour.

Use strlen to determine the exact size of source_string and allocate accordingly, remembering to add an extra byte for the null terminator. Here's a full example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *source_str = "string_read_from_somewhere";
int len = strlen(source_str);
char *target_str = malloc(len + 1);
if (!target_str) {
fprintf(stderr, "%s:%d: malloc failed", __FILE__, __LINE__);
return 1;
}
memcpy(target_str, source_str, len + 1);
puts(target_str);
free(target_str);
return 0;
}
Also, there's no need to cast the result of malloc. Don't forget to free the allocated memory.
As mentioned in the comments, you probably want to restrict the size of the malloced string to a sensible amount.

Related

Why is this use of strcpy considered bad?

I've spotted the following piece of C code, marked as BAD (aka buffer overflow bad).
The problem is I don't quite get why? The input string length is captured before the allocation etc.
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
strcpy(c, s); // BAD
}
return c;
}
Update from comments:
the 'BAD' marker is not precise, the code is not bad, not efficient yes, risky (below) yes,
why risky? +1 after the strlen() call is required to safely allocate the space on heap that also will keep the string terminator ('\0')
There is no bug in your sample function.
However, to make it obvious to future readers (both human and mechanical) that there is no bug, you should replace the strcpy call with a memcpy:
char *my_strdup(const char *s)
{
size_t len = strlen(s) + 1;
char *c = malloc(len);
if (c) {
memcpy(c, s, len);
}
return c;
}
Either way, len bytes are allocated and len bytes are copied, but with memcpy that fact stands out much more clearly to the reader.
There's no problem with this code.
While it's possible that strcpy can cause undefined behavior if the destination buffer isn't large enough to hold the string in question, the buffer is allocated to be the correct size. This means there is no risk of overrunning the buffer.
You may see some guides recommend using strncpy instead, which allows you to specify the maximum number of characters to copy, but this has its own problems. If the source string is too long, only the specified number of characters will be copied, however this also means that the string isn't null terminated which requires the user to do so manually. For example:
char src[] = "test data";
char dest[5];
strncpy(dest, src, sizeof dest); // dest holds "test " with no null terminator
dest[sizeof(dest) - 1] = 0; // manually null terminate, dest holds "test"
I tend towards the use of strcpy if I know the source string will fit, otherwise I'll use strncpy and manually null-terminate.
I cannot see any problem with the code when it comes to the use of strcpy
But you should be aware that it requires s to be a valid C string. That is a reasonable requirement, but it should be specified.
If you want, you could put in a simple check for NULL, but I would say that it's ok to do without it. If you're about to make a copy of a "string" pointed to by a null pointer, then you probably should check either the argument or the result. But if you want, just add this as the first line:
if(!s) return NULL;
But as I said, it does not add much. It just makes it possible to change
if(!str) {
// Handle error
} else {
new_str = my_strdup(str);
}
to:
new_str = my_strdup(str);
if(!new_str) {
// Handle error
}
Not really a huge gain

How to replace a substring in a string?

I have a string, and in it I need to find a substring and replace it. The one to be found and the one that'll replace it are of different length. My code, partially:
char *source_str = "aaa bbb CcCc dddd kkkk xxx yyyy";
char *pattern = "cccc";
char *new_sub_s = "mmmmm4343afdsafd";
char *sub_s1 = strcasestr(source_str, pattern);
printf("sub_s1: %s\r\n", sub_s1);
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); // Memory corruption
char *new_str = (char *)malloc(strlen(source_str) - strlen(pattern) + strlen(new_sub_s) + 1);
strcat(new_str, '\0');
strcat(new_str, "??? part before pattern ???");
strcat(new_str, new_sub_s);
strcat(new_str, "??? part after pattern ???");
Why do I have memory corruption?
How do I effective extract and replace pattern with new_sub_s?
There are multiple problems in your code:
you do not test if sub_s1 was found in the string. What if there is no match?
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); passes a difference of pointers for %s that expects a string. The behavior is undefined.
strcat(new_str, '\0'); has undefined behavior because the destination string is uninitialized and you pass a null pointer as the string to concatenate. strcat expects a string pointer as its second argument, not a char, and '\0' is a character constant with type int (in C) and value 0, which the compiler will convert to a null pointer, with or without a warning. You probably meant to write *new_str = '\0';
You cannot compose the new string with strcat as posted: because the string before the match is not a C string, it is a fragment of a C string. You should instead determine the lengths of the different parts of the source string and use memcpy to copy fragments with explicit lengths.
Here is an example:
char *patch_string(const char *source_str, const char *pattern, const char *replacement) {
char *match = strcasestr(source_str, pattern);
if (match != NULL) {
size_t len = strlen(source_str);
size_t n1 = match - source_str; // # bytes before the match
size_t n2 = strlen(pattern); // # bytes in the pattern string
size_t n3 = strlen(replacement); // # bytes in the replacement string
size_t n4 = len - n1 - n2; // # bytes after the pattern in the source string
char *result = malloc(n1 + n3 + n4 + 1);
if (result != NULL) {
// copy the initial portion
memcpy(result, source_str, n1);
// copy the replacement string
memcpy(result + n1, replacement, n3);
// copy the trailing bytes, including the null terminator
memcpy(result + n1 + n3, match + n2, n4 + 1);
}
return result;
} else {
return strdup(source_str); // always return an allocated string
}
}
Note that the above code assumes that the match in the source string has be same length as the pattern string (in the example, strings "cccc" an "CcCc" have the same length). Given that strcasestr is expected to perform a case independent search, which is confirmed by the example strings in the question, it might be possible that this assumption fail, for example if the encoding of upper and lower case letters have a different length, or if accents are matched by strcasestr as would be expected in French: "é" and "E" should match but have a different length when encoded in UTF-8. If strcasestr has this advanced behavior, it is not possible to determine the length of the matched portion of the source string without a more elaborate API.
printf("sub_str before pattern: %s\r\n", sub_s1 - source_str); // Memory corruption
You're taking the difference of two pointers, and printing it as though it was a pointer to a string. In practice, on your machine, this probably calculates a meaningless number and interprets it as a memory address. Since this is a small number, when interpreted as an address, on your system, this probably points to unmapped memory, so your program crashes. Depending on the platform, on the compiler, on optimization settings, on what else there is in your program, and on the phase of the Moon, anything could happen. It's undefined behavior.
Any half-decent compiler would tell you that there's a type mismatch between the %s directive and the argument. Turn those warnings on. For example, with GCC:
gcc -Wall -Wextra -Werror -O my_program.c
char *new_str = (char *)malloc(…);
strcat(new_str, '\0');
strcat(new_str, "…");
The first call to strcat attempts to append '\0'. This is a character, not a string. It happens that since this is the character 0, and C doesn't distinguish between characters and numbers, this is just a weird way of writing the integer 0. And any integer constant with the value 0 is a valid way of writing a null pointer constant. So strcat(new_str, '\0') is equivalent to strcat(new_str, NULL) which will probably crash due to attempting to dereference the null pointer. Depending on the compiler optimizations, it's possible that the compiler will think that this block of code is never executed, since it's attempting to dereference a null pointer, and this is undefined behavior: as far as the compiler is concerned, this can't happen. This is a case where you can plausibly expect that the undefined behavior causes the compiler to do something that looks preposterous, but makes perfect sense from the way the compiler sees the program.
Even if you'd written strcat(new_str, "\0") as you probably intended, that would be pointless. Note that "\0" is a pointless way of writing "": there's always a null terminator at the end of a string literal¹. And appending an empty string to a string wouldn't change it.
And there's another problem with the strcat calls. At this point, the content of new_str is not initialized. But strcat (if called correctly, even for strcat(new_str, ""), if the compiler doesn't optimize this away) will explore this uninitialized memory and look for the first null byte. Because the memory is uninitialized, there's no guarantee that there is a null byte in the allocated memory, so strcat may attempt to read from an unmapped address when it runs out of buffer, or it may corrupt whatever. Or it may make demons fly out of your nose: once again it's undefined behavior.
Before you do anything with the newly allocated memory area, make it contain the empty string: set the first character to 0. And before that, check that malloc succeeded. It will always succeed in your toy program, but not in the real world.
char *new_str = malloc(…);
if (new_str == NULL) {
return NULL; // or whatever you want to do to handle the error
}
new_str[0] = 0;
strcat(new_str, …);
¹ The only time there isn't a null pointer at the end of a "…" is when you use this to initialize an array and the characters that are spelled out fill the whole array without leaving room for a null terminator.
snprintf can be used to calculate the memory needed and then print the string to the allocated pointer.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main ( void) {
char *source_str = "aaa bbb CcCc dddd kkkk xxx yyyy";
char *pattern = "cccc";
char *new_sub_s = "mmmmm4343afdsafd";
char *sub_s1 = strcasestr(source_str, pattern);
int span = (int)( sub_s1 - source_str);
char *tail = sub_s1 + strlen ( pattern);
size_t size = snprintf ( NULL, 0, "%.*s%s%s", span, source_str, new_sub_s, tail);
char *new_str = malloc( size + 1);
snprintf ( new_str, size, "%.*s%s%s", span, source_str, new_sub_s, tail);
printf ( "%s\n", new_str);
free ( new_str);
return 0;
}

C programming: how to fill an array?

I know strncpy(s1, s2, n) copies n elements of s2 into s1, but that only fills it from the beginning of s1.
For example
s1[10] = "Teacher"
s2[20] = "Assisstant"
strncpy(s1, s2, 2) would yield s1[10] = "As", correct?
Now what if I want s1 to contain "TeacherAs"? How would I do that? Is strncpy the appropriate thing to use in this case?
You can use strcat() to concatenate strings, however you don't want all of the source string copied in this case, so you need to use something like:
size_t len = strlen(s1);
strncpy(s1 + len - 1, s2, 2);
s2[len + 2] = '\0';
(Add terminating nul; thanks #FatalError).
Which is pretty horrible and you need to worry about the amount of space remaining in the destination array. Please note that if s1 is empty that code will break!
There is strncat() (manpage) under some systems, which is much simpler to use:
strncat(s1, s2, 2);
Use strcat.
Make sure your string you're appending to is big enough to hold both strings. In your case it isn't.
From the link above:
char * strcat ( char * destination, const char * source );
Concatenate strings
Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a null-character is included at the end of the new string formed by the concatenation of both in destination.
destination and source shall not overlap.
In order to achieve what you need you have to use strlcat (but beware! it is considered insecure)
strlcat(s1, s2, sizeof(s1));
This will concatenate to s1, part of the s2 string, until the size of s1 is reached (this avoids memory overflow)
then you'll get into s1 the string TeacherAs + a NUL char to terminate it
you need to make sure that you have enough memory is allocated for the resulting string
s1[10]
is not enough space to fit 'TeacherAs'.
from there, you'll want to do something like
//make sure s1 is big enough to hold s1+s2
s1[40]="Teacher";
s2[20]="Assistant";
//how many chars from second string you want to append
int offset = 2;
//allocate a temp buffer
char subbuff[20];
//copy n chars to buffer
memcpy( subbuff, s2, offset );
//null terminate buff
subbuff[offset+1]='\0';
//do the actual cat
strcat(s1,subbuff);
I'd suggest using snprintf(), like:
size_t len = strlen(s1);
snprintf(s1 + len, sizeof(s1) - len, "%.2s", s2);
snprintf() will always nul terminate and won't overrun your buffer. Plus, it's standard as of C99. As a note, this assumes that s1 is an array declared in the current scope so that sizeof works, otherwise you'll need to provide the size.

Fatal error in wchar_t* to char* conversion

Here is a C code that converts a wchar_t* string into a char* string :
wchar_t *myXML = L"<test/>";
size_t length;
char *charString;
size_t i;
length = wcslen(myXML);
charString = (char *)malloc(length);
wcstombs_s(&i, charString, length, myXML, length);
The code compiles but at exectution it detects a fatal error at the last line and stops running.
Now, if I replace the last line with this one :
wcstombs_s(&i, charString, length+1, myXML, length);
I just added +1 to the third argument. Then it works perfectly...
Why is there a need to add this trick ? Or is there a flaw elsewhere in my code ?
You need one extra byte for the '\0' terminator character. wcslen does not include this in the length it returns!
To do this properly, you don't just need to pass length+1 to wcstombs_s but also to malloc:
charString = (char *)malloc(length+1);
wcstombs_s(&i, charString, length+1, myXML, length);
And even then, I suspect it will not work correctly. Not all wide characters can be mapped to a single char, so for non-ASCII characters you will need extra space in the multi-byte string.
DESCRIPTION
The wcslen() function is the wide-character
equivalent of the strlen(3) function. It determines
the length of the wide-character string pointed to by
s, not including the terminating L'\0' character.
The trick is that you should always look for code of the form:
string = malloc(len);
very suspiciously, because both wcslen(3) and strlen(3) return the string length without the nul byte, and malloc(3) must allocate the space with that byte. C kinda sucks sometimes.
So every time you see string = malloc(len); rather than string = malloc(len+1);, be very careful to read how len gets assigned.
char String = (char *)malloc(length + 1);
Ought to do the trick. :)
EDIT:
Better would be to ask wcstombs() for the size to allocate in the first place:
size_t len = wcstombs(NULL,src,0) + 1;
char *dest = malloc(len);
len = wcstombs(dest, src, len);
if (len == -1) /* handle error */ ...
The +1 allocates for the ascii nul, and wcstombs() will report how much memory is required to do the conversion. It'll do the conversion twice, once to keep track of the memory required, and then once to store the result, but it will be MUCH simpler to maintain. The second time, when it stores the result, it will write at most len bytes including the ascii nul.

Typecast:LPCTSTR to Char * for string concatenate operation

Can u Give solution for this code of typecasting, LPCTSTR(here lpsubkey) to Char*
for below code snippet ,
char* s="HKEY_CURRENT_USER\\";
strcat(s,(char*)lpSubKey);
printf("%S",s);
here it makes error of access violation ,so what will be the solution for that?.
...thanks in advance
There are several issues with your code that might well lead to the access violation. I don't think any have anything to do with the cast you mentioned.
You are assigning a pointer to the first element of a fixed size char array to a char * and then attempt to append to this using strcat. This is wrong as there is no additional space left in the implicitly allocated string array. You will need to allocate a buffer big enough to hold the resulting string and then copy the string constant in there before calling strcat. For example, like so:
char *s = (char*)malloc(1024 * sizeof(char));
strcpy(s, "HKEY_CURRENT_USER\\");
strcat(s, T2A(lpSubKey));
printf("%s", s);
free(s);
Please note that the fixed size array I'm allocating above is bad practise. In production code you should always determine the correct size of the array on the go to prevent buffer overflows or use functions like strncat and strncpy to ensure that you are not copying more data into the buffer than the buffer can hold.
These are not the same thing. What are you trying to do?
The problem is you are trying to append to a string that you have not reserved memory for.
Try:
char s[1024] = "HKEY_CURRENT_USER";
strcat(s,(char*)lpSubKey );
printf("%S",s);
Do be careful with the arbitrary size of 1024. If you expect your keys to be much longer your program will crash.
Also, look at strcat_s.
ATL and MFC has set of macros to such conversion, where used next letters:
W - wide unicode string
T - generic character string
A - ANSI character string
OLE - BSTR string,
so in your case you need T2A macros
strcat does not attempt to make room for the combination. You are overwriting memory that isn't part of the string. Off the top of my head:
char *strcat_with_alloc(char *s1, char *s2)
{
if (!s1 || !s2) return NULL;
size_t len1 = strlen(s1);
size_t len2 = strlen(s2);
char *dest = (char *)malloc(len1 + len2 + 1);
if (!dest) return NULL;
strcpy(dest, s1);
strcat(dest, s2);
return dest;
}
now try:
char* s="HKEY_CURRENT_USER\\";
char *fullKey = strcat_with_alloc(s,(char*)lpSubKey);
if (!fullKey)
printf("error no memory");
else {
printf("%S",fullKey);
free(fullKey);
}

Resources