How to use strnlen safely? - c

I am trying to understand what is the correct way to use strnlen so that it will be used safely even considering edge cases.
Like for example having a non null-terminated string as input.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
void* data = malloc(5);
size_t len = strnlen((const char*)data, 10);
printf("len = %zu\n", len);
return 0;
}
If I expect a string of max size 10, but the string does not contain the null character within those 10 characters strnlen will read out of bounds bytes (the input pointer may point to heap allocated data). Is this behavior undefined? If yes, is there a way to safely use strnlen to compute the length of a string which takes into account this type of scenario and does not lead to undefined behavior?

In order to use strnlen safely you need to
Keep track of the size of the input buffer yourself (5 in your case) and pass that as the second parameter, not a number greater than that.
Make sure the input pointer is not NULL.
Make sure another thread is not writing to the buffer.
Formally, you don't need to initialise the contents of the buffer, as conceptually the function reads the buffer as if they are char types.

This code will most likely invoke undefined behavior.
The bytes returned by malloc have indeterminate values. If there are no null bytes in the 5 bytes that are returned, then strnlen will read past those bytes since it was passed a max of 10, and reading past the end of allocated memory invokes undefined behavior.
Simply reading the bytes that were returned however should not be undefined. While indeterminate values could hold a trap representation, strnlen reads the bytes using a char *, and character types do not have trap representation, so the values are merely unspecified and reading them is safe.
If the value passed to strnlen is no larger than the size of allocated memory, then its usage is safe.

Since the actual length of data is 5 and you most likely don't have a '\0' in there, it will start reading unallocated memory(starting at data[5]), which might be a little unpleasant.

Related

Can strnlen be implemented with memchr?

Is the implementation of strnlen that follows invalid?
size_t strnlen(const char *str, size_t maxlen)
{
char *nul = memchr(str, '\0', maxlen);
return nul ? (size_t)(nul - str) : maxlen;
}
I assume that memchr may always look at maxlen bytes no matter the contents of those bytes. Does the contract of strnlen only allow it to look at all maxlen bytes if there is no NUL terminator? If so, the size in memory of str may be less than maxlen bytes, in which case memchr might try to read invalid memory locations. Is this correct?
Yes, the implementation posted is conforming: memchr() is not supposed to read bytes from str beyond the first occurrence of '\0'.
C17 7.24.5.1 The memchr function
Synopsis
#include <string.h>
void *memchr(const void *s, int c, size_t n);
Description
The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s. The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found.
Returns
The memchr function returns a pointer to the located character, or a null pointer if the character does not occur in the object.
memchr may be implemented with efficient techniques that test multiple bytes at a time, potentially reading beyond the first matching byte, but only if this does not cause any visible side effects.
I assume that memchr may always look at maxlen bytes no matter the contents of those bytes.
That assumption is wrong. From POSIX:
Implementations shall behave as if they read the memory byte by byte from the beginning of the bytes pointed to by s and stop at the first occurrence of c (if it is found in the initial n bytes).

Why do I need to append a NUL character to an array?

#include <stdio.h>
#include <string.h>
int main()
{
char x[] = "Happy birthday to You";
char y[25];
char z[15];
printf("The string in array x is: %s\nThe string in array y is: %s\n", x, strcpy(y, x));
strncpy(z, x, 14);
z[14] = '\0'; // Why do I need to do this?
printf("The string in array z is: %s\n", z);
return 0;
}
Why do I need to append a NUL character to the array z?
Also I commented that line and the output didn't change, Why?
And if I do something like z[20] = '\0'; it's compiling and showing me result without any errors. Isn't that illegal access to memory as I declared my z array to be the size of 15?
C language does not have native string type. In C, strings are actually one-dimensional array of characters terminated by a null character \0.
Why do I need to append a NUL character to the array z?
C library function, like strcpy()/strncpy() operate on null-terminated character array. strncpy() copies at most count characters from the source to the destination. If it finds the terminating null character, copies it to destination and return. If count is reached before the entire source array was copied, the resulting character array is not null-terminated and you need to explicitly add the null terminating character at the end of destination.
Also I commented that line and the output didn't change, Why?
The only thing I can say is you are lucky. Perhaps the last character of destination (i.e. z[14]) could have all bits 0. But this may not happen every time you run your program. Make sure to add the null terminating character explicitly when using strncpy() or use some other C library function which do it automatically for you like snprintf().
And if I do something like z[20] = '\0'; it's compiling and showing me result without any errors. Isn't that illegal access to memory as I declared my z array to be the size of 15?
The size of array z is 15 and you are accessing z[20] i.e. accessing array z beyond its size. In C, accessing array out of bounds is undefined behavior. An undefined behavior includes program may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
Why do I need to append a NUL character to the array z?
From the strncpy man:
No null-character is implicitly appended at the end of destination if
source is longer than num. Thus, in this case, destination shall not
be considered a null terminated C string (reading it as such would
overflow).
Also I commented that line and the output didn't change, Why?
My guess is: z was already initialized to 0 (you can not rely on this for arrays with non static storage duration)
Isn't that illegal access to memory as I declared my z array to be the
size of 15?
Yes, out-of-bounds array accesses have undefined behavior, and can result in crashes or incorrect program output.
If you want to avoid hardcoding the NUL terminator, you can switch to snprintf:
snprintf(z, sizeof z, "%.*s", 14, x);
Why do I need to append a NUL character to the array z?
%s in printf prints everything until a NUL-terminator. If you don't add the NUL-terminator at the end, printf will go on accessing invalid memory locations beyond the array until a NUL-terminator invoking Undefined Behavior.
Also I commented that line and the output didn't change, Why?
Consider yourself unlucky that the z[14] had a NUL-terminator. This isn't guaranteed and you still invoke Undefined Behavior.
And if I do something like z[20] = '\0'; it's compiling and showing me result without any errors. Isn't that illegal access to memory as I declared my z array to be the size of 15?
C is a loosely typed language and does not do any bound checking. All power rests in the hands of the programmer and you should be coding stuff properly.
Yes, it accesses illegal memory. This invokes Undefined Behavior which means that anything could happen. Consider yourself unlucky that it didn't crash or anything
In C all char strings are really called null-terminated byte strings. All string functions use the null-terminator to know where the strings end, as they can't otherwise know the length of a string (don't forget that arrays decays to pointers to their first element, and pointer have no information about what they point to except the type).
Also note that there are cases when strncpy will not automatically terminate the destination, which means you have to do it explicitly. Your use in the example is one such case.
If a string is not terminated the string functions can go out of bounds searching for it, and that will lead to undefined behavior. Unfortunately one of the possibilities of UB is to seemingly work.
Lastly, uninitialized local non-static (a.k.a. "automatic") variables, including arrays, will have indeterminate (and seemingly random) values and contents. You could be "lucky" that z[14] just happens to contain a zero.

What is the point of buffer in getline?

http://man7.org/linux/man-pages/man3/getline.3.html
I don't understand the point of the second parameter size_t *n.
Why would you need a buffer between the input (stdin for example) and the output (some character array).
Also, in the example they provide, size_t len = 0;. What is the significance of setting a buffer of size 0?
The point of getline() is that it can reallocate the buffer it receives.
Given a caller doing
size_t n = some_value();
char *buffer = malloc(n);
getline(&buffer, &n, stdin);
The caller supplies an initial buffer of length n. If getline() reallocates, it changes buffer so it points at the memory, and changes n to record the new length.
Obviously, this assumes that it is valid to do a realloc() on buffer i.e. that buffer is either NULL or is the value returned by malloc(), calloc(), or realloc().
The significance of setting n to zero AND buffer to NULL is telling getline() that it has been given no buffer. getline() will therefore reallocate if it reads anything.
All of this is actually described in the link you referred to.
getline() needs to know if the array is big enough to hold the line that the user has entered. It gets the current size of the array from the n parameter. If the array isn't big enough, it reallocates it to the required size. It then updates *lineptr and *n to the new array and size. Updating *n allows the caller to know how big the array is for its future use (such as calling getline() in a loop, as in the example).
Remember, C pointers don't include the size of the array they point to. If a function needs to know this, it has to be passed as a parameter.

malloc puts "garbage" values

how can i prevent or bypass the garbage valus malloc puts in my variable?
attached the code and the output!
thanks!
#include <stdio.h>
#include "stdlib.h"
#include <string.h>
int main() {
char* hour_char = "13";
char* day_char = "0";
char* time = malloc(strlen(hour_char)+strlen(day_char)+2);
time = strcat(time,day_char);
time = strcat(time,"-");
time = strcat(time,hour_char);
printf("%s",time);
free(time);
}
this is the output i get:
á[┼0-13
The first strcat is incorrect, because malloc-ed memory is uninitialized. Rather than using strcat for the first write, use strcpy. It makes sense, because initially time does not have a string to which you concatenate anything.
time = strcpy(time, day_char);
time = strcat(time, "-");
time = strcat(time, hour_char);
Better yet, use sprintf:
sprintf(time, "%s-%s", day_char, hour_char);
First of all, quoting C11, chapter 7.22.3.4 (emphasis mine)
The malloc function allocates space for an object whose size is specified by size and
whose value is indeterminate.
So, the content of the memory location is indeterminate. That is the expected behaviour.
Then, the problem starts when you use the same pointer as the argument where a string is expected, i.e, the first argument of strcat().
Quoting chapter 7.24.3.1 (again, emphasis mine)
The strcat function appends a copy of the string pointed to by s2 (including the
terminating null character) to the end of the string pointed to by s1. The initial character
of s2 overwrites the null character at the end of s1.
but, in your case, there's no guarantee of the terminating null in the target, so it causes undefined behavior.
You need to 0-initialize the memory (or, at least the first element of the memory should be a null) before doing so. You can use calloc() which returns a pointer to already 0-initialized memory, or least, do time[0] = '\0';.
On a different note, you can also make use of snprintf() which removes the hassle of initial 0-filling.
strcat expects to get passed a null-terminated C string. You pass random garbage to it.
This can easily be fixed by turning your data into a null-terminated string of length 0.
char* time = malloc(strlen(hour_char)+strlen(day_char)+2);
time[0] = '\0';
time = strcat(time,day_char);

Why does scanf and malloc to char pointer work even if the size is not specified?

I am refreshing my C skills. I am using a char *s and using malloc to allocate memory to the s. Then using scanf, I read the input to s. But my question is I haven't specified a size for the memory chunk. But the program works. How does the memory gets allocated for the arbitrary length of the input string? Is scanf simply incrementing the pointer and writing data into the location?
#include <stdio.h>
#include <stdlib.h>
int main() {
char *s;
s = (char *) malloc(sizeof(s)); //I did not specify how much like malloc(sizeof(s) * 128)
if (s == NULL) {
fprintf(stderr, "\nError allocating memory for string");
exit(1);
}
scanf("%s", s);
puts(s);
free(s);
return 0;
}
/*
Input:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Output:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
*/
With char *s;, sizeof(s) is the same as sizeof(char *) which is either 4 or 8 depending on whether you are on a 32 bit box or a 64 bit box.
IF you are on a 32 bit box then you can store 3 characters plus the null 'end of string' character. IF you store more it may explode.
sizeof(s) returns the size in bytes of s which is of type char*. Typically on a 32 but machine this is 4 bytes and 8 byts on a 64 bit machine. So you actually have told malloc the number of bytes to allocate and s will point to that region of memory.
You did specify a size: sizeof(s). Since s is a char *, sizeof(s) == sizeof(char *). Depending on your platform, this may be 4 or 8 bytes in length.
So, you've effectively allocated 4 (or 8) bytes to store a string. If you type more than 3 (or 7) characters on the command line, then you are going to start writing past the end of the allocated array, which triggers undefined behaviour. With undefined behaviour, anything could happen: your program might look like it works fine, the program might fill the rest of the memory with ZALGO, the program might segfault horribly, or you might encounter the ever-popular nasal demons. The C specification does not specify what happens (hence the term "undefined behaviour").
The fact that your program "works" at all is a complete fluke, and should never be relied upon.
sizeof(s) is you case returns the size of a character pointer, which will be 4 or 8 bytes depending on if you are running on a 32 or 64 bit platform.
You want to use sizeof(*s) instead. However, since the C standard specifies that sizeof(char) (which is what sizeof(*s) will be) is one, so for character arrays you don't need it.
it will only allocate the space =size of char * and than
simply incrementing the pointer and writing data into the location? as you thought.
the answer for why it works is: because its writing it on the memory area which is not allocated to you but if the area is reserved by some other process your program will crash. so better allocate a larger space.
You are only allocating memory of size equal to size of Integer. If you write strings of greater length to this variable, it will just overwrite the existing memory locations and well, your program will show unexpected behavior.

Resources