Functions strdup() and strndup() have finally made it into the upcoming C23 Standard:
7.24.6.4 The strdup function
Synopsis
#include <string.h>
char *strdup(const char *s);
The strdup function creates a copy of the string pointed to by s in a space allocated as if by a call to malloc.
Returns
The strdup function returns a pointer to the first character of the duplicate string. The returned pointer can be passed to free. If no space can be allocated the strdup function returns a null pointer.
7.24.6.5 The strndup function
Synopsis
#include <string.h>
char *strndup(const char *s, size_t size);
The strndup function creates a string initialized with no more than size initial characters of the array pointed to by s and up to the first null character, whichever comes first, in a space allocated as if by a call to malloc. If the array pointed to by s does not contain a null within the first size characters, a null is appended to the copy of the array.
Returns
The strndup function returns a pointer to the first character of the created string. The returned pointer can be passed to free. If no space can be allocated the strndup function returns a null pointer.
Why was the POSIX-2008 function strnlen not considered for inclusion?
#include <string.h>
size_t strnlen(const char *s, size_t maxlen);
The strnlen() function shall compute the smaller of the number of bytes in the array to which s points, not including the terminating NUL character, or the value of the maxlen argument. The strnlen() function shall never examine more than maxlen bytes of the array pointed to by s.
Interesingly, this function was proposed in https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2351.htm
It was discussed at the London meeting in 2019. See the agenda:
https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2370.htm
The discussion minutes can be found at https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2377.pdf.
Page 59.
It was rejected due to no consensus.
6.33 Sebor, Add strnlen to C2X [N 2351]
...
*Straw poll: Should N2351 be put into C2X?
(11/6/6)
Not clear consensus.
As result the function was not added.
One argument against strnlen is that it's a superfluous function, since we already have memchr. Example:
const char str[666] = "hello world";
size_t length1 = strnlen(str,666);
size_t length2 = (char*)memchr(str,'\0',666) - str;
Advantages of memchr:
Already been a standard C function since the dawn of time.
Possibly more efficient than strnlen in some situations(?).
More generic API.
memchr already ought to be in use for the purpose of sanitising supposed string input before calling functions like strcpy, so what purpose strnlen fills is unclear.
Has proper error handling, unlike strnlen which does not tell if it failed or not.
Disadvantages:
More awkward and type-unsafe interface for the purpose of finding a string length specifically.
Related
This seems like a silly question, but I couldn't find the answer.
Anyways, if you set an arbitrary character to null in a string,
then free the string, does that cause a memory leak?
I suppose my knowledge of how the free function works is limited.
/*
char *
strchr(const char *s, int c);
char *
strrchr(const char *s, int c);
The strchr() function locates the first occurrence of c (converted to a
char) in the string pointed to by s. The terminating null character is
considered part of the string; therefore if c is ‘\0’, the functions
locate the terminating ‘\0’.
The strrchr() function is identical to strchr() except it locates the
last occurrence of c.
*/
char* string = strdup ("THIS IS, A STRING WITH, COMMAS!");
char* ch = strrchr( string, ',' );
*ch = 0;
free( string );
/*
The resulting string should be: "THIS IS, A STRING WITH"
When the string pointer is freed, does this result in a memory leak?
*/
Not a stupid question in my opinion.
TLDR: no you do not cause a memory leak.
Now the longer answer: free has no idea what a string is. If you pass it a char* or an int* it could not care less.
The way malloc and free works is the following: when you call malloc you supply a size and receive a pointer with the promise of that many bytes being reserved on the heap from the position of the pointer onwards. However at that point the size and position are also saved internally in some way (this depends and is an implementation detail).
Now when you call free it does not need to know the size, it can just remove the entry your pointer belongs to together with the size
Addendum: also not every char* points to a string, it just so happens that "abcd" becomes a null terminated char* pointing to the 'a', but a char* itself points to a single char, not multiple
malloc only allocates the chunk of memory and gives you the reference to it. If you do not read or write outside the boundaries of this chunk you can do whatever you want with it.
Is the implementation of strnlen that follows invalid?
size_t strnlen(const char *str, size_t maxlen)
{
char *nul = memchr(str, '\0', maxlen);
return nul ? (size_t)(nul - str) : maxlen;
}
I assume that memchr may always look at maxlen bytes no matter the contents of those bytes. Does the contract of strnlen only allow it to look at all maxlen bytes if there is no NUL terminator? If so, the size in memory of str may be less than maxlen bytes, in which case memchr might try to read invalid memory locations. Is this correct?
Yes, the implementation posted is conforming: memchr() is not supposed to read bytes from str beyond the first occurrence of '\0'.
C17 7.24.5.1 The memchr function
Synopsis
#include <string.h>
void *memchr(const void *s, int c, size_t n);
Description
The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s. The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found.
Returns
The memchr function returns a pointer to the located character, or a null pointer if the character does not occur in the object.
memchr may be implemented with efficient techniques that test multiple bytes at a time, potentially reading beyond the first matching byte, but only if this does not cause any visible side effects.
I assume that memchr may always look at maxlen bytes no matter the contents of those bytes.
That assumption is wrong. From POSIX:
Implementations shall behave as if they read the memory byte by byte from the beginning of the bytes pointed to by s and stop at the first occurrence of c (if it is found in the initial n bytes).
I do not understand why do we have to return an unsigned int for strlcat and strlcpy, why do we need that ? it's not the aim of the function.
Thanks for your responses
The aim of these functions is to move strings (in some way), and a very frequent question by the programmer when you do that is "how many chars were moved?".
Since it already knows this information, it is no more work for it to return this information. Otherwise the programmer would have to do some expensive strlen()'s before and/or after.
Also, if they fail due to a too small buffer, they give you the number of characters they would need, so you can detect the truncation, and potentially reallocate the buffer.
strlcpy and strlcat are non standard functions available on some BSD versions of Unix that perform a safe version of strcpy and strcat with truncation. They do not actually return an unsigned int, but a size_t, which is the type returned by sizeof and may be different from unsigned int.
The functions are declared in <string.h> with these prototypes:
size_t strlcpy(char *dst, const char *src, size_t size);
size_t strlcat(char *dst, const char *src, size_t size);
dst is a pointer to the destination array. This array must contain a valid C string for strlcat, it can be NULL if size is 0.
src must be a valid pointer to a C string.
size is the size in bytes of the destination array.
The functions perform a string copy or concatenation, similar to strcpy and strcat, but do not write beyond the end of the destination array (size). They null terminate the destination array unless size is 0 or dst points to an array without a null terminator in the first size bytes for strlcat.
Both functions return the length in bytes of the resulting string if the destination array was long enough. This permits easy detection of truncation.
Here is an example:
char dest[10];
size_t len = strlcpy(dest, "This is a random string", sizeof dest);
if (len >= sizeof dest) {
/* Truncation occurred. You could ignore it, issue a diagnostic,
or reallocate the destination array to at least `len+1` bytes */
printf("Truncation occurred\n");
}
So here is the answer to your question: the return value is useful if the programmer wants to detect truncation, otherwise it may be safely ignored.
Synopsis of strlcpy() and strlcat()
#include <string.h>
size_t
strlcpy(char *dst, const char *src, size_t size);
size_t
strlcat(char *dst, const char *src, size_t size);
The strlcpy() and strlcat() functions return the total length of the
string they tried to create. For strlcpy() that means the length of src.
For strlcat() that means the initial length of dst plus the length of
src. While this may seem somewhat confusing it was done to make truncation detection simple.
Note however, that if strlcat() traverses size characters without finding
a NUL, the length of the string is considered to be size and the destination string will not be NUL terminated (since there was no space for the
NUL). This keeps strlcat() from running off the end of a string. In
practice this should not happen (as it means that either size is incorrect or that dst is not a proper ``C'' string). The check exists to prevent potential security problems in incorrect code.
I am trying to understand what is the correct way to use strnlen so that it will be used safely even considering edge cases.
Like for example having a non null-terminated string as input.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
void* data = malloc(5);
size_t len = strnlen((const char*)data, 10);
printf("len = %zu\n", len);
return 0;
}
If I expect a string of max size 10, but the string does not contain the null character within those 10 characters strnlen will read out of bounds bytes (the input pointer may point to heap allocated data). Is this behavior undefined? If yes, is there a way to safely use strnlen to compute the length of a string which takes into account this type of scenario and does not lead to undefined behavior?
In order to use strnlen safely you need to
Keep track of the size of the input buffer yourself (5 in your case) and pass that as the second parameter, not a number greater than that.
Make sure the input pointer is not NULL.
Make sure another thread is not writing to the buffer.
Formally, you don't need to initialise the contents of the buffer, as conceptually the function reads the buffer as if they are char types.
This code will most likely invoke undefined behavior.
The bytes returned by malloc have indeterminate values. If there are no null bytes in the 5 bytes that are returned, then strnlen will read past those bytes since it was passed a max of 10, and reading past the end of allocated memory invokes undefined behavior.
Simply reading the bytes that were returned however should not be undefined. While indeterminate values could hold a trap representation, strnlen reads the bytes using a char *, and character types do not have trap representation, so the values are merely unspecified and reading them is safe.
If the value passed to strnlen is no larger than the size of allocated memory, then its usage is safe.
Since the actual length of data is 5 and you most likely don't have a '\0' in there, it will start reading unallocated memory(starting at data[5]), which might be a little unpleasant.
how can i prevent or bypass the garbage valus malloc puts in my variable?
attached the code and the output!
thanks!
#include <stdio.h>
#include "stdlib.h"
#include <string.h>
int main() {
char* hour_char = "13";
char* day_char = "0";
char* time = malloc(strlen(hour_char)+strlen(day_char)+2);
time = strcat(time,day_char);
time = strcat(time,"-");
time = strcat(time,hour_char);
printf("%s",time);
free(time);
}
this is the output i get:
á[┼0-13
The first strcat is incorrect, because malloc-ed memory is uninitialized. Rather than using strcat for the first write, use strcpy. It makes sense, because initially time does not have a string to which you concatenate anything.
time = strcpy(time, day_char);
time = strcat(time, "-");
time = strcat(time, hour_char);
Better yet, use sprintf:
sprintf(time, "%s-%s", day_char, hour_char);
First of all, quoting C11, chapter 7.22.3.4 (emphasis mine)
The malloc function allocates space for an object whose size is specified by size and
whose value is indeterminate.
So, the content of the memory location is indeterminate. That is the expected behaviour.
Then, the problem starts when you use the same pointer as the argument where a string is expected, i.e, the first argument of strcat().
Quoting chapter 7.24.3.1 (again, emphasis mine)
The strcat function appends a copy of the string pointed to by s2 (including the
terminating null character) to the end of the string pointed to by s1. The initial character
of s2 overwrites the null character at the end of s1.
but, in your case, there's no guarantee of the terminating null in the target, so it causes undefined behavior.
You need to 0-initialize the memory (or, at least the first element of the memory should be a null) before doing so. You can use calloc() which returns a pointer to already 0-initialized memory, or least, do time[0] = '\0';.
On a different note, you can also make use of snprintf() which removes the hassle of initial 0-filling.
strcat expects to get passed a null-terminated C string. You pass random garbage to it.
This can easily be fixed by turning your data into a null-terminated string of length 0.
char* time = malloc(strlen(hour_char)+strlen(day_char)+2);
time[0] = '\0';
time = strcat(time,day_char);