I read a lot of stuff about strtok(char* s1, char* s2) and its implementation. However, I still can not understand what makes it a dangerous function to use in multi-threaded program. Can somebody please give me an example of a multi-threaded program and explain the issue there? Please not that I am looking for an example that shows me where the problem arises.
ps: strtok(char* s1, char* s2) is part of the C standard library.
In the first call to strtok, you supply the string and the delimiters. In subsequent calls, the first parameter is NULL, and you just supply the delimiters. strtok remembers the string that you passed in.
In a multithreaded environment, this is dangerous because many threads may be calling strtok with different strings. It will only remember the last one and return the wrong result.
Here is a concrete example:
Suppose first that your program is multi-threaded, and in one thread of execution, the following code runs:
char str1[] = "split.me.up";
// call this line A
char *word1 = strtok(str1, "."); // returns "split", sets str1[5] = '\0'
// ...
// call this line B
char *word2 = strtok(NULL, "."); // we hope to get back "me"
And in another thread, the following code runs:
char str2[] = "multi;token;string";
// call this line C
char *token1 = strtok(str2, ";"); // returns "multi", sets str2[5] = '\0'
// ...
// call this line D
char *token2 = strtok(NULL, ";"); // we hope to get back "token"
The point is, we don't really know what will be in word2 and token2:
If the commands are run in the order (A), (B), (C), (D), then we will get what we want.
But if, say, the commands run in the order (A), (C), (B), (D), then command (B) will search for a . delimeter in "token;string"! This is because the NULL first argument to command (B) tells strtok to continue searching in the last non-NULL search string it was passed, and because command (C) has already run, strtok will use str2.
Then command (B) will return token;string, at the same time setting the new starting character of a search to the NUL terminator at the end of str2. Then the command (D) will think it is searching an empty string, because it will begin its search at str2's NUL terminator, and so will return NULL as well.
Even if you place commands (A) and (B) right next to each other, and commands (C) and (D) right next to each other, there is no guarantee that (B) will be executed right after (A) before either (C) or (D), etc.
If you create some sort of mutex or alternate guard to protect the use of the strtok function, and only call strtok from a thread which has obtained a lock on said mutex, then strtok is safe to use. However, it is probably better just to use the thread-safe strtok_r as others have said.
Edit: There is one more issue, that nobody else has mentioned, namely that strtok modifies and potentially uses global (or static, whatever) variables, and does so in a probably-not-thread-safe way, so even if you don't rely on repeating calls to strtok to get successive "tokens" from the same string, it may not be safe to use it in a multi-threaded environment without guards, etc.
To explain in simple terms, Whenever they name it THREAD safe, they literally mean, it is not just your thread, other thread too can modify it! It is like a cake been shared with 5 friends concurrently. The results are unpredictable who consumed the cake, or who altered it!
Every call to the strtok() function, returns a refrence to a NULL terminated string and it uses a static buffer during parsing. Any subsequent call to the function will refer to that buffer only, and it gets altered.! It is independent of who called it, and thats is the reason for it is not thread safe.
Other hand strtok_r() using a additional 3rd argument called saveptr(we need to specify it) which is probably used to hold that reference for subsequent calls. Thus is no more system specific but in developer control.
An example:( from a book of Steven robbins, unix system programming)
An incorrect use of strtok to determine the average number of words per line.
#include <string.h>
#define LINE_DELIMITERS "\n"
#define WORD_DELIMITERS " "
static int wordcount(char *s) {
int count = 1;
if (strtok(s, WORD_DELIMITERS) == NULL)
return 0;
while (strtok(NULL, WORD_DELIMITERS) != NULL)
count++;
return count;
}
double wordaverage(char *s) { /* return average size of words in s */
int linecount = 1;
char *nextline;
int words;
nextline = strtok(s, LINE_DELIMITERS);
if (nextline == NULL)
return 0.0;
words = wordcount(nextline);
while ((nextline = strtok(NULL, LINE_DELIMITERS)) != NULL) {
words += wordcount(nextline);
linecount++;
}
return (double)words/linecount;
}
The wordaverage function determines the average number of words per line by using strtok to find the next line. The function then calls wordcount to count the number of words on this line. Unfortunately, wordcount also uses strtok, this time to parse the words on the line. Each of these functions by itself would be correct if the other one did not call strtok. The wordaverage function works correctly for the first line, but when wordaverage calls strtok to parse the second line, the internal state information kept by strtok has been reset by wordcount.
Related
I am currently working on a program which involves creating a template for an exam.
In the function where I allow the user to add a question to the exam, I am required to ensure that I use only as much memory as is required to store it's data. I've managed to do so after a great deal of research into the differences between various input functions (getc, scanf, etc), and my program seems to be working but I am concerned about one thing. Here is the code for my function, I've placed a comment on the line in question:
int AddQuestion(){
Question* newQ = NULL;
char tempQuestion[500];
char* newQuestion;
if(exam.phead == NULL){
exam.phead = (Question*)malloc(sizeof(Question));
}
else{
newQ = (Question*)malloc(sizeof(Question));
newQ->pNext = exam.phead;
exam.phead = newQ;
}
while(getchar() != '\n');
puts("Add a new question.\n"
"Please enter the question text below:");
fgets(tempQuestion, 500, stdin);
newQuestion = (char*)malloc(strlen(tempQuestion) + 1); /*Here is where I get confused*/
strcpy(newQuestion, tempQuestion);
fputs(newQuestion, stdout);
puts("Done!");
return 0;
}
What's confusing me is that I've tried running the same code but with small changes to test exactly what is going on behind the scenes. I tried removing the + 1 from my malloc, which I put there because strlen only counts up to but not including the terminating character and I assume that I want the terminating character included. That still ran without a hitch. So I tried running it but with - 1 instead under the impression that doing so would remove whatever is before the terminating character (newline character, correct?). Still, it displayed everything on separate lines.
So now I'm somewhat baffled and doubting my knowledge of how character arrays work. Could anybody help clear up what's going on here, or perhaps provide me with a resource which explains this all in further detail?
In C, strings are conventionally null-terminated. Strlen, however, only counts the characters before the null. So, you always must add one to the value of strlen to get enough space. Or call strdup.
A C string contains the characters you can see "abc" plus one you can't which marks the end of the string. You represent this as '\0'. The strlen function uses the '\0' to find the end of the string, but doesn't count it.
So
myvar = malloc(strlen(str) + 1);
is correct. However, what you tried:
myvar = malloc(strlen(str));
and
myvar = malloc(strlen(str) - 1);
while INCORRECT, MAY seem to work some of the time. This is because malloc typically allocates memory in chunks, (say maybe in units of 16 bytes) rather than the exact size you ask for. So sometimes, you may 'luck out' and end up using the 'slop' at the end of the chunk.
Below is some code which I ran through a static analyzer. It came back saying there is a stack overflow vulnerability in the function that uses strtok below, as described here:
https://cwe.mitre.org/data/definitions/121.html
If you trace the execution, the variables used by strtok ultimately derive their data from the user_input variable in somefunction coming in from the wild. But I figured I prevented problems by first checking the length of user_input as well as by explicitly using strncpy with a bound any time I copied pieces of user_input.
somefunction(user_input) {
if (strlen(user_input) != 23) {
if (user_input != NULL)
free(user_input);
exit(1);
}
Mystruct* mystruct = malloc(sizeof(Mystruct));
mystruct->foo = malloc(3 * sizeof(char));
memset(mystruct->foo, '\0', 3);
strncpy(mystruct->foo,&(user_input[0]),2);
mystruct->bar = malloc(19 * sizeof(char));
memset(mystruct->bar, '\0', 19);
/* Remove spaces from user's input. strtok is not guaranteed to
* not modify the source string so we copy it first.
*/
char *input = malloc(22 * sizeof(char));
strncpy(input,&(user_input[2]),21);
remove_spaces(input,mystruct->bar);
}
void remove_spaces(char *input, char *output) {
const char space[2] = " ";
char *token;
token = strtok(input, space);
while( token != NULL ) {
// the error is indicated on this line
strncat(output, token, strlen(token));
token = strtok(NULL, space);
}
}
I presumed that I didn't have to malloc token per this comment, and elsewhere. Is there something else I'm missing?
strncpy does not increase the safety of your code; indeed, it may well make the code less safe by introducing the possibility of an unterminated output string. But the issue being flagged by the static analyser involves neither with strncpy nor strtok; it's with strncat.
Although they are frequently touted as increasing code safety, that was never the purpose of strncpy, strncat nor strncmp. The strn* alternatives to str* functions are intended for use in a context in which string data is not null-terminated. Such a context exists, although it is rare in student code: fixed-length string fields in fixed-size database records. If a field in a database record always contains 20 characters (CHAR(20) in SQL terms), there's no need to force a trailing 0-byte, which could have been used to allow 21-character names (or whatever the field is). It's a waste of space, and the only reason that those unnecessary bytes might be examined by the database code is to check database integrity. (Not that the extra byte really helps maintain integrity, either. But it must be checked for correctness.)
If you were writing code which used or created fixed-length unterminated string fields, you would certainly need a set of string functions which accept a length argument. But the string library already had those functions: memcpy and memcmp. The strn versions were added to ease the interface when both null-terminated and fixed-length strings are being used in the same application; for example, if a null-terminated string is read from user input and needs to be copied into a fixed-length database field. In that context, the interface of strncpy makes sense: the database field must be completed cleared of old data, but the input string might be too short to guarantee that. So you can't use strcpy even if you check that it won't overflow (because it doesn't necessarily erase old data) and you can't use memcpy (because the bytes following the end of the input string are indeterminated). Hence an interface like strncpy, which guarantees that the destination will be filled but doesn't guarantee that it will be null-terminated.
strncmp and strnlen do have some applications which don't necessarily have to do with fixed-length string records, but they are not safety-related either. strncmp is handy if you want to know whether a given string is a prefix of another string (although a startswith function would have more directly addressed this use case) and strnlen lets you answer the question "Are there at least four characters in this string?" without having to worry about how many cycles would be wasted if the string continued for another four million characters. But that doesn't justify using them in other, more normal, contexts.
OK, that was a bit of a detour. Let's get back to strncat, whose prototype is
char *strncat(char *dest, const char *src, size_t n);
where n is the maximum number of characters to copy. As the man page notes, you (and not the standard library) are responsible for ensuring that the destination has n+1 bytes available for the copy. The library function cannot take responsibility, because it cannot know how much space is available, and it hasn't asked you to specify that.
In my opinion, that makes strncat completely useless. In order to know how much space is available in the destination, you need to know where the concatenation's copy will start. But if you knew that, why on earth would you ask the standard library to scan over the destination looking for the concatenation point? In any case, you are not verifying how much space is available; you simply call:
strncat(output, token, strlen(token));
That does exactly the same thing as strcat(output, token) except that it scans token twice (once to count the bytes and a second time to copy them) and during the copy it does a redundant check to ensure that the count has not been exceeded while copying.
A "safe" version of strncat would require you to specify the length of the destination, but since there is no such function in the standard C library and also no consensus as to what the prototype for such a function would be, you need to guarantee safety yourself by tracking the amount of space used in output by each concatenation. As an extra benefit, if you do that, you can then make the computational complexity of a sequence of concatenations linear in the number of bytes copied, which one might intuitively expect, as opposed to quadratic, as implemented by strcat and strncat.
So a safe and efficient procedure might look like this:
void remove_spaces(char *output, size_t outmax,
char *input) {
if (outmax = 0) return;
char *token = strtok(input, " ");
char *outlimit = output + outmax;
while( token ) {
size_t tokelen = strlen(token);
if (tokelen >= outlimit - output)
tokelen = outlimit - output - 1;
memcpy(output, token, tokelen);
output += tokelen;
token = strtok(NULL, " ");
}
*output = 0;
}
The CWE warning does not mention strtok at all, so the question in the title itself is a red herring. strtok is one of the few parts of your code which is not problematic, although (as you note) it does force an otherwise unnecessary copy of the input string, in case that string is in read-only memory. (As noted above, strncpy does not guarantee that the copy is null-terminated, so it is not safe here. strdup, which needs to be paired with free, is the safest way to copy a string. Fortunately, it will still be part of the C standard instead of just being available almost everywhere.)
That might be a good enough reason to avoid strtok. If so, it's easy to get rid of:
void remove_spaces(char *output, size_t outmax,
/* This version doesn't modify input */
const char *input) {
if (outmax = 0) return;
char *token = strtok(input, " ");
char *outlimit = output + outmax;
while ( *(input += strspn(input, " ")) ) {
size_t cpylen = (tokelen < outlimit - outptr)
? tokelen
: outlimit - outptr - 1;
memcpy(output, input, cpylen);
output += cpylen;
input += tokelen;
}
*output = 0;
}
A better interface would manage to indicate whether the output was truncated, and perhaps give an indication of how many bytes were necessary to accommodate the operation. See snprintf for an example.
I need to create a method that get's commands from users using scanf and runs a function. The command can be simple as help or list but it can also be a command that has an argument like look DIRECTION or take ITEM. What is the best way to go about this? I could just loop through the characters of a single given string and check it manually but I was wondering there was a better way of doing this.
scanf("%s %s", command, argument);
This won't work if there's no argument. Is there a way around this?
There is a 'method' that may work. In fact, two come to mind.
Both rely on whitespace chars (in plain-english, '\n', ' 'and '\t') separating the arguments , and I assume this is good enough.
1
First, the relatively easy one - using main(int argc,char *argv[]) as most CLI programs do.
Then, running a long string of if()s/else if()s which check if the input string matched valid arguments , by testing if strcmp(argv[x],expected_command) returns 0.
You may not yet have been taught about how to use this, and it may appear scary, but its quite easy if you are familiar with string.h, arrays and pointers already.
Google searches and YouTube videos may be of help, and it won't take more than 20 or so minutes.
2
Second, if you have your program with a real CLU 'UI' and the program is in a loop and doesn't just terminate once output is generated - unlike say cat or ls , then you take input of 'command' strings within the program.
This means you will have to, apart from and before the if-ed strcmp()s , ensure that you take input with scanf() safely, and that you are able to take multiple strings as input, since you talk of sub-arguments like look DIRECTION.
The way I have done this myself (in the past) is as follows :
1. Declare a command string, say char cmd[21] = ""; and (optionally) initialise it to be empty , since reading an uninitialised string is UB (and the user may enter EOF).
2. Declare a function (for convenience) to check scanf() say like so:
int handle_scanf(int returned,int expected){
if(returned==expected)
return 0;
if(returned==EOF){
puts("\n Error : Input Terminated Immaturely.");
/* you may alternatively do perror() but then
will have to deal with resetting errno=0 and
including errno.h */
return -1;
}
else{
puts("\n Error : Insufficient Input.");
return -2;
}
}
Which can be used as : if(handle_scanf(scanf(xyz,&xyz),1)==0) {...}
As scanf() returns number of items 'taken' (items that matched with expected format-string and were hence saved) and here there is only 1 expected argument.
3. Declare a function (for convenience) to clear/flush stdin so that if and when unnecessary input is left in the input stream , (which if not dealt with, will be passed to the next place where input is taken) it can be 'eaten'.
I do it like so :
void eat()
{
int eat; while ((eat = getchar()) != '\n' && eat != EOF);
}
Essentially clears input till a newline or EOF is read. Since '\n' and EOF represent End Of Line and End Of File , and modern I/O is line buffered and performed through the stdin file , it makes sense to stop upon reading them.
EDIT : You may alternatively use a macro, for slightly better performance.
4. Print a prompt and take input, like so :
fputs("\n >>> ",stdout);
int check = handle_scanf(scanf("%20s",cmd),1);
Notice what I did here ?
"%20s" does two things - stops buffer overflow (because more than 20 chars won't be scanned into cmd) and also stops scanning when a whitespace char is encountered. So, your main command must be one-word.
5. Check if the the command is valid .
This is to be done with the aforementioned list of checking if strcmp(cmd,"expected_cmd")==0 , for all possible expected commands.
If there is no match, with an else , display an error message and call eat();(arguments to invalid command can be ignored) but only if(check != -1).
If check==-1 , this may mean that the user has sent an EOF signal to the program, in which case, calling eat() within a loop will result in an infinite loop displaying the error message, something which you don't want.
6. If there is a match, absorb the whitespace separating char and then scanf() into a char array ( if the user entered, look DIRECTION, DIRECTION is still in the input stream and will only now be saved to said char array ). This can be done like so :
#define SOME_SIZE 100 // use an appropriate size
if(strcmp(cmd,"look")==0 && check==0){ // do if(check==0) before these ifs, done here just for my convenience)
getchar(); // absorb whitespace seperator
char strbuff[SOME_SIZE] = ""; // string buffer of appropriate size
if(handle_scanf(scanf("%99[^\n]",strbuff),1)==0){
eat();
/* look at DIRECTION :) */
}
// handle_scanf() generated appropriate error msg if it doesn't return 0
}
Result
All in all, this code handles scanf mostly safely and can indeed be used in a way that the user will only type , say :
$ ./myprogram
>>> look DIRECTION
# output
>>> | #cursor
If it is all done within a big loop inside main() .
Conclusion
In reality, you may end up needing to use both together if your program is complex enough :)
I hope my slightly delayed answer is of help :)
In case of any inaccuracies , or missing details, please comment and I will get back to you ASAP
Here's a good way to parse an inputted string using strtok and scanf with a limit of 99 characters
#include <string.h>
char command[99];
scanf("%[^\n]%*c", command); //This gets the entire string and spaces
char *token;
token = strtok(command, " "); //token = the first string separated by a " "
if (strcmp(token, "help") == 0){
//do function
}
else if (strcmp(token, "go") == 0){ //if the command has an argument, you have to get the next string
token = strtok(NULL, " "); //this gets the next string separated by a space
if (strcmp(token, "north") == 0){
//do function
}
}
You can keep using token = strtok(NULL, " "); until token = NULL signifying the end of a string
I have the following query string
address=1234&port=1234&username=1234&password=1234&gamename=1234&square=1234&LOGIN=LOGIN
I am trying to parse it into different variables: address,port,username,password,gamename,square and command (which would hold LOGIN)
I was thinking of using strtok but I don't think it would work. How can I parse the string to capture the variables ?
P.S - some of the fields might be empty - no gamename provided or square
When parsing a sting that may contain an empty-field between delimiters, strtok cannot be used, because strtok will treat any number of sequential delimiters as a single delimiter.
So in your case, if the variable=values fields may also contain an empty-field between the '&' delimiters, you must use strsep, or other functions such as strcspn, strpbrk or simply strchr and a couple of pointers to work your way down the string.
The strsep function is a BSD function and may not be included with your C library. GNU includes strsep and it was envisioned as a replacement for strtok simply because strtok cannot handle empty-fields.
(If you do not have strsep available, you will simply need to keep a start and end pointer and use a function like strchr to locate each occurrence of '&' setting the end pointer to one before the delimiter and then obtaining the var=value information from the characters between start and end pointer, then updating both to point one past the delimiter and repeating.)
Here, you can use strsep with a delimiter of "&\n" to locate each '&' (the '\n' char included presuming the line was read from a file with a line-oriented input function such as fgets or POSIX getline). You can then simply call strtok to parse the var=value text from each token returned by strsep using "=" as the delimiter (the '\n' having already been removed from the last token when parsing with strsep)
An example inserting a specific empty-field for handling between "...gamename=1234&&square=1234...", could be as follows:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (void) {
char array[] = "address=1234&port=1234&username=1234&password=1234"
"&gamename=1234&&square=1234&LOGIN=LOGIN",
*query = strdup (array), /* duplicate array, &array is not char** */
*tokens = query,
*p = query;
while ((p = strsep (&tokens, "&\n"))) {
char *var = strtok (p, "="),
*val = NULL;
if (var && (val = strtok (NULL, "=")))
printf ("%-8s %s\n", var, val);
else
fputs ("<empty field>\n", stderr);
}
free (query);
}
(note: strsep takes a char** parameter as its first argument and will modify the argument to point one past the delimiter, so you must preserve a reference to the start of the original allocated string (query above)).
Example Use/Output
$ ./bin/strsep_query
address 1234
port 1234
username 1234
password 1234
gamename 1234
<empty field>
square 1234
LOGIN LOGIN
(note: the conversion of "1234" to a numeric value has been left to you)
Look things over and let me know if you have further questions.
I have the following query string
address=1234&port=1234&username=1234&password=1234&gamename=1234&square=1234&LOGIN=LOGIN
I am trying to parse it into different variables: address,port,username,password,gamename,square and command (which would hold LOGIN)
I was thinking of using strtok but I don't think it would work. How can I parse the string to capture the variables ?
P.S - some of the fields might be empty - no gamename provided or square
When parsing a sting that may contain an empty-field between delimiters, strtok cannot be used, because strtok will treat any number of sequential delimiters as a single delimiter.
So in your case, if the variable=values fields may also contain an empty-field between the '&' delimiters, you must use strsep, or other functions such as strcspn, strpbrk or simply strchr and a couple of pointers to work your way down the string.
The strsep function is a BSD function and may not be included with your C library. GNU includes strsep and it was envisioned as a replacement for strtok simply because strtok cannot handle empty-fields.
(If you do not have strsep available, you will simply need to keep a start and end pointer and use a function like strchr to locate each occurrence of '&' setting the end pointer to one before the delimiter and then obtaining the var=value information from the characters between start and end pointer, then updating both to point one past the delimiter and repeating.)
Here, you can use strsep with a delimiter of "&\n" to locate each '&' (the '\n' char included presuming the line was read from a file with a line-oriented input function such as fgets or POSIX getline). You can then simply call strtok to parse the var=value text from each token returned by strsep using "=" as the delimiter (the '\n' having already been removed from the last token when parsing with strsep)
An example inserting a specific empty-field for handling between "...gamename=1234&&square=1234...", could be as follows:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (void) {
char array[] = "address=1234&port=1234&username=1234&password=1234"
"&gamename=1234&&square=1234&LOGIN=LOGIN",
*query = strdup (array), /* duplicate array, &array is not char** */
*tokens = query,
*p = query;
while ((p = strsep (&tokens, "&\n"))) {
char *var = strtok (p, "="),
*val = NULL;
if (var && (val = strtok (NULL, "=")))
printf ("%-8s %s\n", var, val);
else
fputs ("<empty field>\n", stderr);
}
free (query);
}
(note: strsep takes a char** parameter as its first argument and will modify the argument to point one past the delimiter, so you must preserve a reference to the start of the original allocated string (query above)).
Example Use/Output
$ ./bin/strsep_query
address 1234
port 1234
username 1234
password 1234
gamename 1234
<empty field>
square 1234
LOGIN LOGIN
(note: the conversion of "1234" to a numeric value has been left to you)
Look things over and let me know if you have further questions.