Can strnlen be implemented with memchr? - c

Is the implementation of strnlen that follows invalid?
size_t strnlen(const char *str, size_t maxlen)
{
char *nul = memchr(str, '\0', maxlen);
return nul ? (size_t)(nul - str) : maxlen;
}
I assume that memchr may always look at maxlen bytes no matter the contents of those bytes. Does the contract of strnlen only allow it to look at all maxlen bytes if there is no NUL terminator? If so, the size in memory of str may be less than maxlen bytes, in which case memchr might try to read invalid memory locations. Is this correct?

Yes, the implementation posted is conforming: memchr() is not supposed to read bytes from str beyond the first occurrence of '\0'.
C17 7.24.5.1 The memchr function
Synopsis
#include <string.h>
void *memchr(const void *s, int c, size_t n);
Description
The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s. The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found.
Returns
The memchr function returns a pointer to the located character, or a null pointer if the character does not occur in the object.
memchr may be implemented with efficient techniques that test multiple bytes at a time, potentially reading beyond the first matching byte, but only if this does not cause any visible side effects.

I assume that memchr may always look at maxlen bytes no matter the contents of those bytes.
That assumption is wrong. From POSIX:
Implementations shall behave as if they read the memory byte by byte from the beginning of the bytes pointed to by s and stop at the first occurrence of c (if it is found in the initial n bytes).

Related

Why is strnlen() not considered for inclusion in C23?

Functions strdup() and strndup() have finally made it into the upcoming C23 Standard:
7.24.6.4 The strdup function
Synopsis
#include <string.h>
char *strdup(const char *s);
The strdup function creates a copy of the string pointed to by s in a space allocated as if by a call to malloc.
Returns
The strdup function returns a pointer to the first character of the duplicate string. The returned pointer can be passed to free. If no space can be allocated the strdup function returns a null pointer.
7.24.6.5 The strndup function
Synopsis
#include <string.h>
char *strndup(const char *s, size_t size);
The strndup function creates a string initialized with no more than size initial characters of the array pointed to by s and up to the first null character, whichever comes first, in a space allocated as if by a call to malloc. If the array pointed to by s does not contain a null within the first size characters, a null is appended to the copy of the array.
Returns
The strndup function returns a pointer to the first character of the created string. The returned pointer can be passed to free. If no space can be allocated the strndup function returns a null pointer.
Why was the POSIX-2008 function strnlen not considered for inclusion?
#include <string.h>
size_t strnlen(const char *s, size_t maxlen);
The strnlen() function shall compute the smaller of the number of bytes in the array to which s points, not including the terminating NUL character, or the value of the maxlen argument. The strnlen() function shall never examine more than maxlen bytes of the array pointed to by s.
Interesingly, this function was proposed in https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2351.htm
It was discussed at the London meeting in 2019. See the agenda:
https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2370.htm
The discussion minutes can be found at https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2377.pdf.
Page 59.
It was rejected due to no consensus.
6.33 Sebor, Add strnlen to C2X [N 2351]
...
*Straw poll: Should N2351 be put into C2X?
(11/6/6)
Not clear consensus.
As result the function was not added.
One argument against strnlen is that it's a superfluous function, since we already have memchr. Example:
const char str[666] = "hello world";
size_t length1 = strnlen(str,666);
size_t length2 = (char*)memchr(str,'\0',666) - str;
Advantages of memchr:
Already been a standard C function since the dawn of time.
Possibly more efficient than strnlen in some situations(?).
More generic API.
memchr already ought to be in use for the purpose of sanitising supposed string input before calling functions like strcpy, so what purpose strnlen fills is unclear.
Has proper error handling, unlike strnlen which does not tell if it failed or not.
Disadvantages:
More awkward and type-unsafe interface for the purpose of finding a string length specifically.

Why unsigned int and not a void for strlcat and strlcpy?

I do not understand why do we have to return an unsigned int for strlcat and strlcpy, why do we need that ? it's not the aim of the function.
Thanks for your responses
The aim of these functions is to move strings (in some way), and a very frequent question by the programmer when you do that is "how many chars were moved?".
Since it already knows this information, it is no more work for it to return this information. Otherwise the programmer would have to do some expensive strlen()'s before and/or after.
Also, if they fail due to a too small buffer, they give you the number of characters they would need, so you can detect the truncation, and potentially reallocate the buffer.
strlcpy and strlcat are non standard functions available on some BSD versions of Unix that perform a safe version of strcpy and strcat with truncation. They do not actually return an unsigned int, but a size_t, which is the type returned by sizeof and may be different from unsigned int.
The functions are declared in <string.h> with these prototypes:
size_t strlcpy(char *dst, const char *src, size_t size);
size_t strlcat(char *dst, const char *src, size_t size);
dst is a pointer to the destination array. This array must contain a valid C string for strlcat, it can be NULL if size is 0.
src must be a valid pointer to a C string.
size is the size in bytes of the destination array.
The functions perform a string copy or concatenation, similar to strcpy and strcat, but do not write beyond the end of the destination array (size). They null terminate the destination array unless size is 0 or dst points to an array without a null terminator in the first size bytes for strlcat.
Both functions return the length in bytes of the resulting string if the destination array was long enough. This permits easy detection of truncation.
Here is an example:
char dest[10];
size_t len = strlcpy(dest, "This is a random string", sizeof dest);
if (len >= sizeof dest) {
/* Truncation occurred. You could ignore it, issue a diagnostic,
or reallocate the destination array to at least `len+1` bytes */
printf("Truncation occurred\n");
}
So here is the answer to your question: the return value is useful if the programmer wants to detect truncation, otherwise it may be safely ignored.
Synopsis of strlcpy() and strlcat()
#include <string.h>
size_t
strlcpy(char *dst, const char *src, size_t size);
size_t
strlcat(char *dst, const char *src, size_t size);
The strlcpy() and strlcat() functions return the total length of the
string they tried to create. For strlcpy() that means the length of src.
For strlcat() that means the initial length of dst plus the length of
src. While this may seem somewhat confusing it was done to make truncation detection simple.
Note however, that if strlcat() traverses size characters without finding
a NUL, the length of the string is considered to be size and the destination string will not be NUL terminated (since there was no space for the
NUL). This keeps strlcat() from running off the end of a string. In
practice this should not happen (as it means that either size is incorrect or that dst is not a proper ``C'' string). The check exists to prevent potential security problems in incorrect code.

How to use strnlen safely?

I am trying to understand what is the correct way to use strnlen so that it will be used safely even considering edge cases.
Like for example having a non null-terminated string as input.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
void* data = malloc(5);
size_t len = strnlen((const char*)data, 10);
printf("len = %zu\n", len);
return 0;
}
If I expect a string of max size 10, but the string does not contain the null character within those 10 characters strnlen will read out of bounds bytes (the input pointer may point to heap allocated data). Is this behavior undefined? If yes, is there a way to safely use strnlen to compute the length of a string which takes into account this type of scenario and does not lead to undefined behavior?
In order to use strnlen safely you need to
Keep track of the size of the input buffer yourself (5 in your case) and pass that as the second parameter, not a number greater than that.
Make sure the input pointer is not NULL.
Make sure another thread is not writing to the buffer.
Formally, you don't need to initialise the contents of the buffer, as conceptually the function reads the buffer as if they are char types.
This code will most likely invoke undefined behavior.
The bytes returned by malloc have indeterminate values. If there are no null bytes in the 5 bytes that are returned, then strnlen will read past those bytes since it was passed a max of 10, and reading past the end of allocated memory invokes undefined behavior.
Simply reading the bytes that were returned however should not be undefined. While indeterminate values could hold a trap representation, strnlen reads the bytes using a char *, and character types do not have trap representation, so the values are merely unspecified and reading them is safe.
If the value passed to strnlen is no larger than the size of allocated memory, then its usage is safe.
Since the actual length of data is 5 and you most likely don't have a '\0' in there, it will start reading unallocated memory(starting at data[5]), which might be a little unpleasant.

How to get the string size in bytes?

As the title implies, my question is how to get the size of a string in C. Is it good to use sizeof if I've declared it (the string) in a function without malloc in it? Or, if I've declared it as a pointer? What if I initialized it with malloc? I would like to have an exhaustive response.
You can use strlen. Size is determined by the terminating null-character, so passed string should be valid.
If you want to get size of memory buffer, that contains your string, and you have pointer to it:
If it is dynamic array(created with malloc), it is impossible to get
it size, since compiler doesn't know what pointer is pointing at.
(check this)
If it is static array, you can use sizeof to get its size.
If you are confused about difference between dynamic and static arrays, check this.
Use strlen to get the length of a null-terminated string.
sizeof returns the length of the array not the string. If it's a pointer (char *s), not an array (char s[]), it won't work, since it will return the size of the pointer (usually 4 bytes on 32-bit systems). I believe an array will be passed or returned as a pointer, so you'd lose the ability to use sizeof to check the size of the array.
So, only if the string spans the entire array (e.g. char s[] = "stuff"), would using sizeof for a statically defined array return what you want (and be faster as it wouldn't need to loop through to find the null-terminator) (if the last character is a null-terminator, you will need to subtract 1). If it doesn't span the entire array, it won't return what you want.
An alternative to all this is actually storing the size of the string.
While sizeof works for this specific type of string:
char str[] = "content";
int charcount = sizeof str - 1; // -1 to exclude terminating '\0'
It does not work if str is pointer (sizeof returns size of pointer, usually 4 or 8) or array with specified length (sizeof will return the byte count matching specified length, which for char type are same).
Just use strlen().
If you use sizeof()then a char *str and char str[] will return different answers. char str[] will return the length of the string(including the string terminator) while char *str will return the size of the pointer(differs as per compiler).
I like to use:
(strlen(string) + 1 ) * sizeof(char)
This will give you the buffer size in bytes. You can use this with snprintf() may help:
const char* message = "%s, World!";
char* string = (char*)malloc((strlen(message)+1))*sizeof(char));
snprintf(string, (strlen(message)+1))*sizeof(char), message, "Hello");
Cheers! Function: size_t strlen (const char *s)
There are two ways of finding the string size bytes:
1st Solution:
# include <iostream>
# include <cctype>
# include <cstring>
using namespace std;
int main()
{
char str[] = {"A lonely day."};
cout<<"The string bytes for str[] is: "<<strlen(str);
return 0;
}
2nd Solution:
# include <iostream>
# include <cstring>
using namespace std;
int main()
{
char str[] = {"A lonely day."};
cout<<"The string bytes for str[] is: "<<sizeof(str);
return 0;
}
Both solution produces different outputs. I will explain it to you after you read these.
The 1st solution uses strlen and based on cplusplus.com,
The length of a C string is determined by the terminating null-character: A C string is as long as the number of characters between the beginning of the string and the terminating null character (without including the terminating null character itself).
That can explain why does the 1st Solution prints out the correct string size bytes when the 2nd Solution prints the wrong string size bytes. But if you still don't understand, then continue reading.
The 2nd Solution uses sizeof to find out the string size bytes. Based on this SO answer, it says (modified it):
sizeof("f") must return 2 string size bytes, one for the 'f' and one for the terminating '\0' (terminating null-character).
That is why the output is string size bytes 14. One for the whole string and one for '\0'.
Conclusion:
To get the correct answer for 2nd Solution, you must do sizeof(str)-1.
References:
Sizeof string literal
https://cplusplus.com/reference/cstring/strlen/?kw=strlen

C Noob: Define size of a char array while copying contents of char* into it

I have a basic question in C.
I need to print the contents of a char pointer. The contents are binary and therefore I use hex format to see the contents.
Would detecting a null still work?
unsigned char *input = "������";
printf("input =");
int count = 0;
while(*input != '\0'){
printf("%02x", *input);
input++;
}
printf("\n");
Now what happens if I have to copy the pointer to a char array?
How can I assign the size of the char array? I understand sizeof returns only the size of datatype that char points to. But is there any way?
unsigned char copyInput[size??];
strcpy(copyInput, input);
for (i=0, i <size?, i++)
{
printf("copyInput[%d]= %02x", i, copyInput[i]);
}
Thanks in advance!
1) To the extent that C has strings at all, they are defined as "an arbitrary contiguous sequence of nonzero bytes, terminated with a zero byte". Therefore, if your binary data is guaranteed never to contain bytes whose value is zero, you can safely treat it as a C string (use the str* functions with it, etc). But if your binary data might have zero bytes somewhere in the middle, you need to track the length separately and operate on it with the mem* functions instead.
2) You use strlen to find the length of the string (without the terminating zero byte). However, in standard C89 you can't use the result of strlen to set the size of a char[] variable, because the size has to be known at compile time. If you're using C99 or GNU extensions, you can define the size of an array at runtime:
size_t n = strlen(s1);
char s2[n+1];
memcpy(s2, s1, n);
The n+1 is necessary, or you won't have space for the terminating NUL. If you can't use C99 nor GNU extensions, your only option is to allocate space on the heap:
size_t n = strlen(s1);
char *s2 = malloc(n+1);
memcpy(s2, s1, n);
or, with a common library extension, just
char *s2 = strdup(s1);
Either way, don't forget to free(s2) later. By the way, this is a case where it would have been safe to use strcpy, because you know by construction that the destination buffer is big enough. I used memcpy because it may be slightly more efficient and it means human readers won't see "strcpy" and start worrying.
If it's a bunch of chars terminated with a 0, just use strlen() since that is C's definition of a string. It doesn't matter than some (or most) of the characters might be unprintable, as long as 0 is the terminator.
You will have problems if any of the input bytes are 0. In this case the loop will stop at that character. Otherwise, you can treat it as a string.
Treating it as a string, you can use strlen() to get the input's size and then dynamically allocate memory to your copy. The copy can be made with strcpy as you did, but it is safer to use strncpy.
char *input = "input binary array";
int count = strlen(input)+1; // plus '\0'
char *copy = (char *) malloc(count*sizeof(char));
strncpy(copy, input, count+1);

Resources