strtok not going through all tokens - c

I'm trying to implement a shell as part of a school assignment, and I'm stuck on the file input/output redirection part.
More specifically, I've come up with a function which allows me to detect whether or not the command entered in specifies a '>' or '<' or even a '|'.
Ideally, if I enter ls -a > ls.tx', then the tokens ls -a and ls.txt should be returned.
My code doesn't do this, it only returns ls -a then stops.
My code is below:
/*commandLine is a char* taken in from the user, and is a null-terminated string */
int counter = 0;
parsedLine = strtok(commandLine, ">");
while (parsedLine != NULL)
{
if (counter == 0)
{
strncpy(parsedCpy, parsedLine, strlen(parsedLine));
parseCommand(parsedCpy, commands);
counter++;
}
else
{
redirect->re_stdout = parsedLine;
}
parsedLine = strtok(NULL, ">");
}
I've tried it in another test file just to see if there was something wrong, but this test file (code below) returns the expected result (that is, ls -a and ls.txt)
char myString[] = "ls -a > ls.txt";
char* parsed;
parsed = strtok(myString, ">");
while (parsed != NULL)
{
printf("%s\n", parsed);
parsed = strtok(NULL, ">");
}
Is there something that I'm just not understanding? I don't really see where I'm going wrong, since the code itself is nearly the same in both cases.

Note that strncpy won't zero terminate a string unless the zero termination is part of the source being copied. See man strncpy. It says:
Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
That could be horsing something else up depending upon what parseCommand does.
In this case, you should just do a strcpy. A strncpy doesn't really do anything for you if you're giving it the length of the source string, unless you're intentionally trying to avoid copying the null terminator. So you should use, strcpy(parsedCpy, parsedLine);.

I cannot see how parsedLine is declared, but it needs to be handled explicitly and carefully. i.e. make sure the pointer to that value is not changed except by strtok(), and make sure that it remains null terminated. One thing I do when using strtok() for multiple calls, is to use an intermediate value to collect results, helping to keep the target buffer pure and unchanged except by strtok()
A small code snippet to illustrate:
char a[] = {"ls -a > ls.tx"};
char *buff;
char keep[80];
buff = strtok(a, ">");
strcpy(keep, buff);
buff = strtok(NULL, ">");
strcat(keep, buff);
This usage of strtok() is clean, i.e. it does not allow buff to be affected except by another call to strtok()
By comparison, this section of your code is a little scary because I do not know the output of the strncpy() which depends so heavily on the third argument, and can corrupt (place unexpected results into) parseCommand :
if (counter == 0)
{
strncpy(parsedCpy, parsedLine, strlen(parsedLine));
parseCommand(parsedCpy, commands);
counter++;
}
else
{
redirect->re_stdout = parsedLine;
}
parsedLine = strtok(NULL, ">");
Along the lines of keeping the target buffer pure, (even though it does not appear to be an issue here), strtok() is not thread safe. If a function using strtok() is used in a multi threaded process, the target buffer is subject to any number of calls, resulting in unexpected, and perhaps even undefined behavior. In this case using strtok_r() is a better option

Related

C - Segmentation fault using strtok

I have this code where its read multiple files and print a certain value. After reading files, at a certain moment my while loop stop and show a segmentation fault ...
Here is my code
int main () {
const char s[2] = ",";
const char s2[2] = ":";
char var1[] = "fiftyTwoWeekHigh\"";
char *fiftyhigh;
char *fiftyhigh2;
char *fiftyhigh_token;
char *fiftyhigh2_token;
char var2[] = "fiftyTwoWeekLow\"";
char *fiftylow;
char *fiftylow2;
char *fiftylow_token;
char *fiftylow2_token;
char var3[] = "regularMarketPrice\"";
char *price;
char *price2;
char *price_token;
char *price2_token;
FILE *fp;
char* data = "./data/";
char* json = ".json";
char line[MAX_LINES];
char line2[MAX_LINES];
int len;
char* fichier = "./data/indices.txt";
fp = fopen(fichier, "r");
if (fp == NULL){
printf("Impossible d'ouvrir le fichier %s", fichier);
return 1;
}
while (fgets(line, sizeof(line), fp) != NULL) {
char fname[10000];
len = strlen(line);
if (line[len-1] == '\n') {
line[len-1] = 0;
}
int ret = snprintf(fname, sizeof(fname), "%s%s%s", data, line, json);
if (ret < 0) {
abort();
}
printf("%s\n", fname);
FILE* f = fopen(fname, "r");
while ( fgets( line2, MAX_LINES, f ) != NULL ) {
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
price = strstr(line2, var3);
price_token = strtok(price, s);
price2 = strstr(price_token, s2);
price2_token = strtok(price2, s2);
printf("%s\n", price2_token);
//printf("\n%s\t%s\t%s\t%s\t%s", line, calculcx(fiftyhigh2_token, price2_token, fiftylow2_token), "DIV-1", price2_token, "test");
}
fclose(f);
}
fclose(fp);
return 0;
}
and the output is :
./data/k.json
13.59
5.31
8.7
./data/BCE.json
60.14
46.03
56.74
./data/BNS.json
80.16
46.38
78.73
./data/BLU.json
16.68
2.7
Segmentation fault
It is like my program stop because it can't reach a certain data at a certain file... Is there a way to allocate more memory ? Because my MAX_LINES is already set at 6000.
I'm assuming that the lines in your file look something like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }
In other words it's some kind of JSON format. I'm assuming that the line starts with '{' so each line is a JSON object.
You read that line into line2, which now contains:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }\0
Note the \0 at the end that terminates the string. Note also that "fiftyTwoWeekLow" comes first, which turns out to be really important.
Now let's trace through the code here:
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
First you call strstr to find the position of "fiftyTwoWeekHigh". This will return a pointer to the position of that field name in the line. Then you call strtok to find the comma that separates this value from the next. I think that this is where things start to go wrong. After the call to strtok, line2 looks like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100\0 ... }\0
Note that strtok has modified the string: the comma has been replaced with \0. That's so you can use the returned pointer fiftyhigh_token as a string without seeing all the stuff that came after the comma.
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
Next you look for the colon and then call strtok with a pointer to the colon. Since the delimiter you're passing to strok is the colon, strtok ignores the colon and returns the next token, which (because the string we're looking at, which ends after "100," has no more colons) is the rest of the string, in other words, the number.
So you've gotten your number, but probably not in the way you expected? There was really no point in the second call to strtok since (assuming the JSON was well-formed) the position of "100" was just fiftyhigh2+1.
Now we try to find "fiftyTwoWeekLow:"
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
This is basically the same process, and after you call strtok, line2 like this:
{"fiftyTwoWeekLow":32\0"fiftyTwoWeekHigh":100\0 ... }\0
Note that you're only able to find "fiftyTwoWeekLow" because it comes before "fiftyTwoWeekHigh" in the line. If it had come after, then you'd have been unable to find it due to the \0 added after "fiftyTwoWeekHigh" earlier. In that case, strstr would have returned NULL, which would cause strtok to return NULL, and then you'd definitely have gotten a seg fault after passing NULL to strstr.
So the code is really sensitive to the order in which the fields appear in the line, and it's probably failing because some of your lines have the fields in a different order. Or maybe some fields are just missing from some lines, which would have the same effect.
If you're parsing JSON, you should really use a library designed for that purpose. But if you really want to use strtok then you should:
Read line2.
Call strtok(line2, ",") once, then repeatedly call strtok(NULL, ",") in a loop until it returns null. This will break up the line into tokens that each look like "someField":100.
Isolate the field name and value from each of these tokens (just call strchr(token, ':') to find the value). Do not call strtok here, because it will change the internal state of strtok and you won't be able to use strtok(NULL, ",") to continue processing the line.
Test the field name, and depending on its value, set an appropriate variable. In other words, if it's the "fiftyTwoWeekLow" field, set a variable called fiftyTwoWeekLow. You don't have to bother to strip off the quotes, just include them in the string you're comparing with.
Once you've processed all the tokens (strtok returns NULL), do something with the variables you set.
You may be to pass ",{}" as the delimiter to strtok in order to get rid of any open and close curly braces that surround the line. Or you could look for them in each token and ignore them if they appear.
You could also pass "\"{},:" as the delimiter to strtok. This would cause strtok to emit an alternating sequence of field names and values. You could call strtok once to get the field name, again to get the value, then test the field name and do something with the value.
Using strtok is a pretty primitive way of parsing JSON, but it will will work as long as your JSON only contains simple field names and numbers and doesn't include any strings that themselves contain delimiter characters.
Did you mean '\0' ?
if (line[len-1] == '\n') {
line[len-1] = 0;
}
I advise you to use gdb to see where the segfault occurs and why.
I don't think you have to allocate much more memory. But the segfault may happens because you don't have anymore data and you still print the result.
Use if(price2_token!=NULL) printf("%s\n", price2_token); for example.

Nested strtok in c resulting in an infinite loop

I make user enter a username and I then go to this file and extract the values corresponding the particular user. I know the fault is with the way that I am using strtok as it only works for the first user.
Once I find the user, I want to stop searching in the file.
int fd;
fd=open(fileName,O_RDONLY,0744);
if (fd==-1)
{
printf("The file userDetails.txt failed to open.\n");
exit(1);
}
int fileSize = sizeof(fileOutput)/sizeof(fileOutput[0]); //size of file
printf("%d\n",fileSize);
int bytesRead = read(fd,&fileOutput,fileSize);
//TERMINATING BUFFER PROPERLY
fileOutput[bytesRead] = '\0';
printf("%s\n",fileOutput);
//READING LINE BY LINE IN FILE
char *line;
char *data;
char *name;
char *saltValue;
char *encryptedValue;
line = strtok(fileOutput,"\n"); //SPLIT ACCORDING TO LINE
while (line != NULL)
{
data = strtok(line, ":");
while (data != NULL)
{
name = data;
if (strcmp(name,userName)==0)
{
printf("%s\n","User exists");
saltValue = strtok(NULL,":");
printf("%s\n",saltValue);
encryptedValue = strtok(NULL, ":");
printf("%s\n",encryptedValue);
break;
}
else
{
break;
}
}
if (strcmp(name,userName)==0) //user found
{
break;
}
else //user not found
{
strtok(NULL,"\n");
}
}
If you are limited to read, that's fine, but you can only use strtok once on "\n" to parse each line from fileOutput, not nested again to parse the ':'. Otherwise, since strtok modifies the string by inserting '\0' at the delimiter found, you will be writing the nul-terminating character within lines that will cause the outer strtok to consider the string finished on the next iteration.
Instead, use a single pointer on each line with strchr (line, ':') to locate the first ':' with the line and then strncmp() using the pointer to the start of line and then pointer locating ':'. For example, if you have a function to check if the userName is contained in your file (returning 0 on success and 1 on failure) you could do:
...
for (char *line = strtok(fileOutput,"\n"); line; line = strtok (NULL, "\n"))
{
char *p = strchr (line, ':'); /* find first ':' */
if (!p) /* if not found, bail */
break;
if (strncmp (userName, line, p - line) == 0) { /* check name */
printf ("found user: %s hash: %s\n", userName, p+1);
return 0;
}
}
fputs ("user not found.\n", stdout);
return 1;
This is probably one of the simpler approaches you could take.
Strtok modifies its input string, which makes impossible to call it in nesting mode, the inner loop workings destroy the work of the outer strtok(), making it impossible to continue.
Appart of this, using strtok() in your problem is not adequate for another reason: if you try to use it to parse the /etc/passwd file (or one of such similar format files that we cope with today) you'll run in trouble with empty fields. In case you have an empty field (two consecutive : chars in sequence, strtok() will skip over both, skipping completely undetected the empty field) Strtok is an old, legacy function that was writen to cope with the three characters (\n\t) that are used to separate arguments in bourne shell. In the case of /etc/passwd you need to cope with possibly empty fields, and that makes it impossible to use strtok() to parse them.
You can easily use strchr() instead to search for the : of /etc/passwd in a non-skipping way, just write something like (you can encapsulate this in a function):
char *not_strtok_but_almost(char *s, char *delim)
{
static char *p = NULL; /* this makes the function non-reentrant, like strtok() */
char *saved = NULL;
if (s) {
p = s;
while (strchr(delim, *p)) /* while *p is in the delimiters, skip */
p++;
/* *p is not more in the delimiters. save as return value */
saved = p;
}
/* search for delimiters after value */
while (*p && !strchr(delim, *p)) /* while *p not null, and not in the delimiter set */
p++;
/* *p is at the end of the string or p points to one of the delimiters */
*p = '\0';
return saved;
}
This function has all the trouble of strtok(3) but you can use it (taking care of its non-reentrancy and that it modifies the source string, making it not nestable on several loops) because it doesn't skip all the separators in one shot, but just stops after the first separator found.
To solve the nesting problem, you can act in a different way, lets assume you have several identifiers separated by spaces (as in /etc/group file) which should require (it doesn't, as the names field is the last, you are not going to call strtok again on the first loop, but to get a NULL. You can process your file in a level first precedence, instead of a depth first precedence. You first seek all fields in the first level, and then go, field by field, reading its subfields (that will use a different delimiter probably)
As all of these modifications are all done in the same string, no need to allocate a buffer for each and strdup() the strings before use... the work can be done in the same file, and strdup()ing the string at the beginning if you need to store the different subfields.
Make any comments if you are in doubt with this (be careful as I have not tested the routine above, it can have probably a bug)

Substring garbage in C

I want to make a program that cuts a string into 2 strings if the string contains a ?
For example,
Test1?Test2
should become
Test1 Test2
I am trying to do it dynamically but at the prints I get some garbage and I cannot figure out what I am doing wrong. Here is my code:
for(i=0;i<strlen(expression);i++){
if(expression[i]=='?' || expression[i]=='*'){
position=i; break;
}
}
printf("position=%d\n",position);
if(position!=0){
before = (char*) malloc(sizeof(char)*(position +1 ));
strncpy(before,expression,position);
before[strlen(before)]='\0';
}
if(position!=strlen(expression))
{
after = (char*) malloc(sizeof(char)*( strlen(expression)- position+1));
strncpy(after,expression+position+1,strlen(expression)-position+1);
after[strlen(after)]='\0';
}
printf("before:%s,after:%s\n",before,after);
Since it is illegal to take length of a string until it is null terminated, you cannot do this:
before[strlen(before)]='\0';
You need to do this instead:
before[pos]='\0';
Same goes for the after part - this is illegal:
after[strlen(after)]='\0';
You need to compute the length of after upfront, and then terminate the string using the pre-computed length.
You can use strsep
char* token;
while ((token = strsep(&string, "?")) != NULL)
{
printf("%s\n", token);
}
Instead of doing it manually, you can also make good use of strtok(). Check the details here.
for(i=0;i<strlen(expression);i++){
This is very inefficient. strlen(expression) gets computed at every loop (so you get an O(n2) complexity, unless the compiler is clever enough to prove that expression stays a constant string and then to move the computation of strlen(expression) before the loop.....). Should be
for (i=0; expression[i]; i++) {
(since expression[i] is zero only when you reached the terminating null byte; for readability reasons you could have coded that condition expression[i] != '\0' if you wanted to)
And as dasblinkenlight answered you should not compute the strlen on an uninitialized buffer.
You should compute once and for all the strlen(expression) i.e. start with:
size_t exprlen = strlen(expression);
and use exprlen everywhere else instead of strlen(expression).
At last, use strdup(3):
after = (char*) malloc(sizeof(char)*( strlen(expression)- position+1));
strncpy(after,expression+position+1,strlen(expression)-position+1);
after[strlen(after)]='\0';
should be
after = strdup(expression-position+1);
and you forgot a very important thing: always test against failure of memory allocation (and of other system functions); so
before = (char*) malloc(sizeof(char)*(position +1));
should really be (since sizeof(char) is always 1):
before = malloc(position+1);
if (!before) { perror("malloc of before"); exit(EXIT_FAILURE); };
Also, if the string pointed by expression is not used elsewhere you could simply clear the char at the found position (so it becomes a string terminator) like
if (position>=0) expression[position] = `\0`;
and set before=expression; after=expression+position+1;
At last, learn more about strtok(3), strchr(3), strstr(3), strpbrk(3), strsep(3), sscanf(3)
BTW, you should always enable all warnings and debugging info in the compiler (e.g. compile with gcc -Wall -g). Then use the debugger (like gdb) to step by step and understand what is happening... If a memory leak detector like valgrind is available, use it.

when i use strtok(), is there a way to pick out individual tokens?

if for instance i have a string: Hi:my:name:is:lacrosse1991: how could I use strtok to retrieve just the is (in other words the fourth field)? This would be similar to how you would use cut in bash, where i would just do cut -d ":" -f 4. If this is the wrong function to be doing such a thing, please let me know, any help is much appreciated, thanks! (I've just recently started teaching myself C, I apologize ahead of time if these are obvious questions that i am asking)
here is an example of the cut command that I would use
x=$(echo "$death" | cut -d ':' -f 4)
y=$(echo "$death" | cut -d ':' -f 5)
z=$(echo "$death" | cut -d ':' -f 6)
If you know it's the 4th field you just call strtok() 4 times - there is no way of jumping to the 4th field without scanning the entire string ( which is what strtok is doing)
I'd avoid strtok for this purpose (and most others, for that matter). Under the circumstances, I think I'd just use sscanf with a scanset conversion:
char field_four[128];
sscanf(input_string, "%*[^:]:%*[^:]:%*[^:]:%127[^:]:", field_four);
In this, we start by repeating %*[^:]: three times. Each of these reads a string including any characters except a colon, then a colon. The * means that should be read from the input, but ignored (not assigned to anything). Then, for the fourth field we have the same thing, except we do not include the *, so that field does get assigned, in this case to field_four (though you should obviously use a more meaningful name).
For the fourth field, I've also added a specification of the maximum length. Since sscanf always includes a \0 to terminate the string, but doesn't include it in the count, we need to specify a size one smaller than the size of the buffer.
One way to extract the Nth field (whilst keeping the string in original condition after the call) is to use the following getFld function. First, the requisite headers:
#include <stdio.h>
#include <string.h>
#include <malloc.h>
Now the function itself, which I hope is documented well enough:
char *getFld (char *srchStr, char delim, int numFld) {
char *copyStr, *retStr, *tmpStrPtr, delims[2];
// Make a copy so as to not damage original.
if ((copyStr = strdup (srchStr)) == NULL) return NULL;
// Create delimiter string from character.
delims[0] = delim; delims[1] = '\0';
retStr = NULL;
// Start loop, extracting fields.
tmpStrPtr = strtok (copyStr, delims);
while (tmpStrPtr != NULL) {
// If this is the field we want, make a copy.
if (numFld == 0) retStr = strdup (tmpStrPtr);
// Get next field.
tmpStrPtr = strtok (NULL, delims);
numFld--;
}
// Clean up, return field copy (must be freed eventually) or NULL.
free (copyStr);
return retStr;
}
And, finally, a test program for it:
int main (void) {
int i = 0;
char str[] = "Hi:my:name:is:lacrosse1991";
char *fld;
while ((fld = getFld (str, ':', i)) != NULL) {
printf ("Field %d is '%s'\n", i, fld);
free (fld);
i++;
}
return 0;
}
Upon compiling and running this, I get:
Field 0 is 'Hi'
Field 1 is 'my'
Field 2 is 'name'
Field 3 is 'is'
Field 4 is 'lacrosse1991'
Now, keep in mind that strdup is not standard C but, if your implementation doesn't have it, you can use this one. You may also want to change the behaviour on strdup failure since the current case cannot be distinguished from an out of range field.
But, this code should be fine as a baseline to start with.

Read in exactly one or two strings

I have a program that accepts input from the command line. While all of the available commands are one string, several of the commands require a secondary string to further define the action.
e.g. "end" is one command, and "add foo" is a second.
My code handles the 2 string inputs fine, but when I try to access a single string command (such as "end") the program waits for more input instead of acting immediately.
Is there some way I can get the program to read in exactly one line (which may have up to two strings) rather than the way it is now?
Here's how it's currently implemented:
while(1)
{
scanf("%s%s", commandString,floorPath);
if(!strcmp(commandString,"end") return;
//I've got several of these as an "if / else", but there's no
//need to reprint them here.
}
Read the first string alone, and depending on the input decide if there is a need to read another string or not.
while(1)
{
scanf("%s", commandString);
if (requiresAnotherString(commandString))
{
scanf("%s", floorPath);
// handle two string command
}
else
{
// handle one string command
}
}
What MByD said, or alternatively, read a single line and then separately from the scanf() parse the line you read in and see if it is a one word command or a two word command and take the appropriate actions.
Here is an alternative:
int cmdFromStdin (char **cmd) {
int count = 0, nBytes = 0;
char *word, inline[BUFSIZ];
if (nBytes = getline (inline, BUSIZ, STDIN)) < 0)
return nBytes;
for (word = strtok (inline, " ") ; word != NULL ;
word = strtok (NULL, " "))
cmd[count++] = word;
return count;
}
It's been a while since I have coded in C, but I remember having problems with scanf (so I used to use getline()). The strtok function will parse out the string, on return you can check for success and work on the array cmd. I believe you have to include stdio.h, stdlib.h and string.h for this. My C is a bit rusty, so please excuse the syntax errors.

Resources