One of the two problems that the strtok_s function (C11) solves is it prevents storing outside of the input string. As I understand it this would only be possible if you pass a non null terminated string to strtok.
Is it correct that if I only ever pass properly null terminated strings to strtok then there isn't a risk of it writing outside of the input string?
Let's start answering to the main question, about strtok writing beyond the size of the buffer containing the string.
strtok actually modifies the input string: it writes a string terminator ('\0') where the delimiter character used to be. In this way it can return to the user null-terminated tokens
In case a bad input is provided (a buffer in which the string terminator is missing) it could write beyond the input buffer size. It would read until a '\0' is found in memory and write data if before the end is reached a delimiter is found
Now, we cannot say properly that "strtok_s prevents storing outside of the input string" but we can say that this function provides a way to control the number of bytes of the input string that are examined, and as a consequence written (as explained above).
The control we are talking about is the same we have using strncpy instead of strcpy: we can pass to strtok_s the maximum size if the input string avoiding memory corruption in case of missing string terminator.
Let's have a look to strtok_s() signature:
char *strtok_s(char *restrict str, rsize_t *restrict strmax,
const char *restrict delim, char **restrict ptr);
Comparing it to strtok's interface, we have two more parameters. The ptr parameter is useful to make it reentrant and it is present also in strtok_r. It is not directly related to this question.
The strmax parameter is the one we are looking about
strmax - pointer to an object which initially holds the size of str: strtok_s stores the number of characters that remain to be examined
(the emphasis is mine).
So, passing to strmax the pointer to a variable initialized with the size of the char buffer containing the input string, will make sure that a write beyond that size will ever occur.
Related
I understand that strings in C are just character arrays. So I tried the following code, but it gives strange results, such as garbage output or program crashes:
#include <stdio.h>
int main (void)
{
char str [5] = "hello";
puts(str);
}
Why doesn't this work?
It compiles cleanly with gcc -std=c17 -pedantic-errors -Wall -Wextra.
Note: This post is meant to be used as a canonical FAQ for problems stemming from a failure to allocate room for a NUL terminator when declaring a string.
A C string is a character array that ends with a null terminator.
All characters have a symbol table value. The null terminator is the symbol value 0 (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere.
Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character. Your example does not do this, it only allocates room for the 5 characters of "hello". Correct code should be:
char str[6] = "hello";
Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:
char str[5+1] = "hello";
But you can also use this and let the compiler do the counting and pick the size:
char str[] = "hello"; // Will allocate 6 bytes automatically
When allocating memory for a string dynamically in run-time, you also need to allocate room for the null terminator:
char input[n] = ... ;
...
char* str = malloc(strlen(input) + 1);
If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes.
The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: '\0'. This is 100% equivalent to writing 0, but the \ serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as if(str[i] == '\0') will check if the specific character is the null terminator.
Please note that the term null terminator has nothing to do with null pointers or the NULL macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as NUL with one L, not to be confused with NULL or null pointers. See answers to this SO question for further details.
The "hello" in your code is called a string literal. This is to be regarded as a read-only string. The "" syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out sizeof("hello") you will get 6, not 5, because you get the size of the array including a null terminator.
It compiles cleanly with gcc
Indeed, not even a warning. This is because of a subtle detail/flaw in the C language that allows character arrays to be initialized with a string literal that contains exactly as many characters as there is room in the array and then silently discard the null terminator (C17 6.7.9/15). The language is purposely behaving like this for historical reasons, see Inconsistent gcc diagnostic for string initialization for details. Also note that C++ is different here and does not allow this trick/flaw to be used.
From the C Standard (7.1.1 Definitions of terms)
1 A string is a contiguous sequence of characters terminated by and
including the first null character. The term multibyte string is
sometimes used instead to emphasize special processing given to
multibyte characters contained in the string or to avoid confusion
with a wide string. A pointer to a string is a pointer to its initial
(lowest addressed) character. The length of a string is the number of
bytes preceding the null character and the value of a string is the
sequence of the values of the contained characters, in order.
In this declaration
char str [5] = "hello";
the string literal "hello" has the internal representation like
{ 'h', 'e', 'l', 'l', 'o', '\0' }
so it has 6 characters including the terminating zero. Its elements are used to initialize the character array str which reserve space only for 5 characters.
The C Standard (opposite to the C++ Standard) allows such an initialization of a character array when the terminating zero of a string literal is not used as an initializer.
However as a result the character array str does not contain a string.
If you want that the array would contain a string you could write
char str [6] = "hello";
or just
char str [] = "hello";
In the last case the size of the character array is determined from the number of initializers of the string literal that is equal to 6.
Can all strings be considered an array of characters (Yes), can all character arrays be considered strings (No).
Why Not? and Why does it matter?
In addition to the other answers explaining that the length of a string is not stored anywhere as part of the string and the references to the standard where a string is defined, the flip-side is "How do the C library functions handle strings?"
While a character array can hold the same characters, it is simply an array of characters unless the last character is followed by the nul-terminating character. That nul-terminating character is what allows the array of characters to be considered (handled as) a string.
All functions in C that expect a string as an argument expect the sequence of characters to be nul-terminated. Why?
It has to do with the way all string functions work. Since the length isn't included as part of an array, string-functions, scan forward in the array until the nul-character (e.g. '\0' -- equivalent to decimal 0) is found. See ASCII Table and Description. Regardless whether you are using strcpy, strchr, strcspn, etc.. All string functions rely on the nul-terminating character being present to define where the end of that string is.
A comparison of two similar functions from string.h will emphasize the importance of the nul-terminating character. Take for example:
char *strcpy(char *dest, const char *src);
The strcpy function simply copies bytes from src to dest until the nul-terminating character is found telling strcpy where to stop copying characters. Now take the similar function memcpy:
void *memcpy(void *dest, const void *src, size_t n);
The function performs a similar operation, but does not consider or require the src parameter to be a string. Since memcpy cannot simply scan forward in src copying bytes to dest until a nul-terminating character is reached, it requires an explicit number of bytes to copy as a third parameter. This third parameter provides memcpy with the same size information strcpy is able to derive simply by scanning forward until a nul-terminating character is found.
(which also emphasizes what goes wrong in strcpy (or any function expecting a string) if you fail to provide the function with a nul-terminated string -- it has no idea where to stop and will happily race off across the rest of your memory segment invoking Undefined Behavior until a nul-character just happens to be found somewhere in memory -- or a Segmentation Fault occurs)
That is why functions expecting a nul-terminated string must be passed a nul-terminated string and why it matters.
Intuitively...
Think of an array as a variable (holds things) and a string as a value (can be placed in a variable).
They are certainly not the same thing. In your case the variable is too small to hold the string, so the string gets cut off. ("quoted strings" in C have an implicit null character at the end.)
However it's possible to store a string in an array that is much larger than the string.
Note that the usual assignment and comparison operators (= == < etc.) don't work as you might expect. But the strxyz family of functions comes pretty close, once you know what you're doing. See the C FAQ on strings and arrays.
I understand that strings in C are just character arrays. So I tried the following code, but it gives strange results, such as garbage output or program crashes:
#include <stdio.h>
int main (void)
{
char str [5] = "hello";
puts(str);
}
Why doesn't this work?
It compiles cleanly with gcc -std=c17 -pedantic-errors -Wall -Wextra.
Note: This post is meant to be used as a canonical FAQ for problems stemming from a failure to allocate room for a NUL terminator when declaring a string.
A C string is a character array that ends with a null terminator.
All characters have a symbol table value. The null terminator is the symbol value 0 (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere.
Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character. Your example does not do this, it only allocates room for the 5 characters of "hello". Correct code should be:
char str[6] = "hello";
Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:
char str[5+1] = "hello";
But you can also use this and let the compiler do the counting and pick the size:
char str[] = "hello"; // Will allocate 6 bytes automatically
When allocating memory for a string dynamically in run-time, you also need to allocate room for the null terminator:
char input[n] = ... ;
...
char* str = malloc(strlen(input) + 1);
If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes.
The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: '\0'. This is 100% equivalent to writing 0, but the \ serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as if(str[i] == '\0') will check if the specific character is the null terminator.
Please note that the term null terminator has nothing to do with null pointers or the NULL macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as NUL with one L, not to be confused with NULL or null pointers. See answers to this SO question for further details.
The "hello" in your code is called a string literal. This is to be regarded as a read-only string. The "" syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out sizeof("hello") you will get 6, not 5, because you get the size of the array including a null terminator.
It compiles cleanly with gcc
Indeed, not even a warning. This is because of a subtle detail/flaw in the C language that allows character arrays to be initialized with a string literal that contains exactly as many characters as there is room in the array and then silently discard the null terminator (C17 6.7.9/15). The language is purposely behaving like this for historical reasons, see Inconsistent gcc diagnostic for string initialization for details. Also note that C++ is different here and does not allow this trick/flaw to be used.
From the C Standard (7.1.1 Definitions of terms)
1 A string is a contiguous sequence of characters terminated by and
including the first null character. The term multibyte string is
sometimes used instead to emphasize special processing given to
multibyte characters contained in the string or to avoid confusion
with a wide string. A pointer to a string is a pointer to its initial
(lowest addressed) character. The length of a string is the number of
bytes preceding the null character and the value of a string is the
sequence of the values of the contained characters, in order.
In this declaration
char str [5] = "hello";
the string literal "hello" has the internal representation like
{ 'h', 'e', 'l', 'l', 'o', '\0' }
so it has 6 characters including the terminating zero. Its elements are used to initialize the character array str which reserve space only for 5 characters.
The C Standard (opposite to the C++ Standard) allows such an initialization of a character array when the terminating zero of a string literal is not used as an initializer.
However as a result the character array str does not contain a string.
If you want that the array would contain a string you could write
char str [6] = "hello";
or just
char str [] = "hello";
In the last case the size of the character array is determined from the number of initializers of the string literal that is equal to 6.
Can all strings be considered an array of characters (Yes), can all character arrays be considered strings (No).
Why Not? and Why does it matter?
In addition to the other answers explaining that the length of a string is not stored anywhere as part of the string and the references to the standard where a string is defined, the flip-side is "How do the C library functions handle strings?"
While a character array can hold the same characters, it is simply an array of characters unless the last character is followed by the nul-terminating character. That nul-terminating character is what allows the array of characters to be considered (handled as) a string.
All functions in C that expect a string as an argument expect the sequence of characters to be nul-terminated. Why?
It has to do with the way all string functions work. Since the length isn't included as part of an array, string-functions, scan forward in the array until the nul-character (e.g. '\0' -- equivalent to decimal 0) is found. See ASCII Table and Description. Regardless whether you are using strcpy, strchr, strcspn, etc.. All string functions rely on the nul-terminating character being present to define where the end of that string is.
A comparison of two similar functions from string.h will emphasize the importance of the nul-terminating character. Take for example:
char *strcpy(char *dest, const char *src);
The strcpy function simply copies bytes from src to dest until the nul-terminating character is found telling strcpy where to stop copying characters. Now take the similar function memcpy:
void *memcpy(void *dest, const void *src, size_t n);
The function performs a similar operation, but does not consider or require the src parameter to be a string. Since memcpy cannot simply scan forward in src copying bytes to dest until a nul-terminating character is reached, it requires an explicit number of bytes to copy as a third parameter. This third parameter provides memcpy with the same size information strcpy is able to derive simply by scanning forward until a nul-terminating character is found.
(which also emphasizes what goes wrong in strcpy (or any function expecting a string) if you fail to provide the function with a nul-terminated string -- it has no idea where to stop and will happily race off across the rest of your memory segment invoking Undefined Behavior until a nul-character just happens to be found somewhere in memory -- or a Segmentation Fault occurs)
That is why functions expecting a nul-terminated string must be passed a nul-terminated string and why it matters.
Intuitively...
Think of an array as a variable (holds things) and a string as a value (can be placed in a variable).
They are certainly not the same thing. In your case the variable is too small to hold the string, so the string gets cut off. ("quoted strings" in C have an implicit null character at the end.)
However it's possible to store a string in an array that is much larger than the string.
Note that the usual assignment and comparison operators (= == < etc.) don't work as you might expect. But the strxyz family of functions comes pretty close, once you know what you're doing. See the C FAQ on strings and arrays.
Is it possible to store the char '\0' inside a char array and then store different characters after? For example
char* tmp = "My\0name\0is\0\0";
I was taught that is actually called a string list in C, but when I tried to print the above (using printf("%s\n", tmp)), it only printed
"My".
Yes, it is surely possible, however, furthermore, you cannot use that array as string and get the content stored after the '\0'.
By definition, a string is a char array, terminated by the null character, '\0'. All string related function will stop at the terminating null byte (for example, an argument, containing a '\0' in between the actual contents, passed to format specifier%s in printf()).
Quoting C11, chapter §7.1.1, Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null
character. [...]
However, for byte-by-byte processing, you're good to go as long as you stay within the allocated memory region.
The problem you are having is with the function you are using to print tmp. Functions like printf will assume that the string is null terminated, so it will stop when it sees the first \0
If you try the following code you will see more of the value in tmp
int main(int c,char** a){
char* tmp = "My\0name\0is\0\0";
write(1,tmp,12);
}
How to give the string during runtime rather than pre initialization to a character pointer say char *b; through keyboard?
First, to clear things up a bit, as per C11 standard, chapter §7.1.1,
A string is a contiguous sequence of characters terminated by and including the first null
character.
and, as per §6.4.5
A character string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes, as in "xyz".
So, they are not the same.
However, to input a string from user, you can follow either of the below cases
define a char array and scan the input (scanf(), fgets()).
define a pointer, allocate memory and then use scanf() or fgets() to read the input from the user.
You can define your string before compile and running your code with following syntax:
char * str = "Hello World";
by this way you define a constant string that change its content result in unspecific behavior. if you want you can allocate a string pointer and then use functions like scanf(), fgets() and ... in order to get its content from user or you can use functions like sprintf() in order to fill your string in your program. for example:
char *str = malloc(sizeof(char) * 20);
sprintf(str, "%s", Hello World");
If you want to have dynamic size string in runtime you can implement something like C++ vectors
and store your string in it.
is it possible to copy buffer to a string? strncpy can copy string into an allocated string array, i'm wondering if this is possible to do the opposite
char *buffer[50];
fgets(buffer, 50, stdin);
//how can i assign string in buffer to a single string (char)?
First, a C string is not just a char, but an array of char with the last element (or at least the last one that's counted as part of the string) set to the null character (numerically 0, also '\0' as a character constant).
Next, in the code you posted you probably meant char buffer[50] rather than char *buffer[50]... the version you have is an array of 50 char *s, but you need an array of 50 chars. After that's corrected, then...
Since fgets() always fills in a null char at the end of the string it read, buffer would already be a valid C string after you call fgets(). If you'd like to copy it to another string so you can reuse the buffer to read more input, you can use the usual string handling functions from <string.h>, such as strcpy(). Just make sure the string you copy it into is large enough to hold all the used characters plus a terminating null character.
This code copies the string into a newly malloc()ed string (error checking omitted):
char buffer[50];
char *str;
fgets(buffer,50,stdin);
str = malloc(strlen(buffer) + 1);
strcpy(str,buffer);
This code does the same, but copies to a char array on the stack (not malloc()ed):
char buffer[50];
char str[50];
fgets(buffer,50,stdin);
strcpy(str,buffer);
strlen() will tell you how many characters are used in the string, but doesn't count the terminating null (so you need to have one more character allocated than what strlen() returns). strcpy() will copy the characters and the null at the end from one string/buffer to another. It stops after the null, and doesn't know how much space you've allocated -- so you need to make sure it will find a null character before running out of space in the destination, or reaching the end of the source buffer. If in doubt, place a null at the end of the buffer yourself to make sure.
It should be char buffer[50]; and yes, you can then use strncpy (which does not care if it got a static or a heap allocated zone).
But I would recommend using getline in your case.
First of all, you must have:
char buffer[50];
because otherwise you have an array of 50 char *s which is not what you want. That is, you read 50 chars from input and create addresses from them (which means "boom"!)
Second, yes, you can use strncpy to copy. Note that a string is basically an array of chars, terminated by '\0' (NUL). So in this case, buffer is indeed a string. You would want to copy the string only if you want to keep the original and modify the second (or keep a copy of original and then modify buffer). Otherwise, you can safely use the same buffer as the desired string.
Third, I don't know how exactly your input looks like, but what you want to do, you can most likely do it better with *scanf functions.