Problem with strtok and segmentation fault - c

I have two helper functions to break up strings in the format of decimal prices ie. "23.00", "2.30"
Consider this:
char price[4] = "2.20";
unsigned getDollars(char *price)
{
return atoi(strtok(price, "."));
}
unsigned getCents(char *price)
{
strtok(price, ".");
return atoi(strtok(NULL, "."));
}
Now when I run the below I get a segmentation fault:
printf("%u\n", getDollars(string));
printf("%u\n", getCents(string));
However when I run them seperately without one following the other, they work fine. What am I missing here? Do I have to do some sort of resetting of strtok??
My solution:
With the knowledge about strtok I gained from the answer I chose below, I changed the implementation of the helper functions so that they copy the passed in string first, thus shielding the original string and preventing this problem:
#define MAX_PRICE_LEN 5 /* Assumes no prices goes over 99.99 */
unsigned getDollars(char *price)
{
/* Copy the string to prevent strtok from changing the original */
char copy[MAX_PRICE_LEN];
char tok[MAX_PRICE_LEN];
/* Create a copy of the original string */
strcpy(copy, price);
strcpy(tok, strtok(copy, "."));
/* Return 0 if format was wrong */
if(tok == NULL) return 0;
else return atoi(tok);
}
unsigned getCents(char *price)
{
char copy[MAX_PRICE_LEN];
char tok[MAX_PRICE_LEN];
strcpy(copy, price);
/* Skip this first part of the price */
strtok(copy, ".");
strcpy(tok, strtok(NULL, "."));
/* Return 0 if format was wrong */
if(tok == NULL) return 0;
else return atoi(tok);
}

Because strtok() modifies the input string, you run into problems when it fails to find the delimiter in the getCents() function after you call getDollars().
Note that strtok() returns a null pointer when it fails to find the delimiter. Your code does not check that strtok() found what it was looking for - which is always risky.
Your update to the question demonstrates that you have learned about at least some of the perils (evils?) of strtok(). However, I would suggest that a better solution would use just strchr().
First, we can observe that atoi() will stop converting at the '.' anyway, so we can simplify
getDollars() to:
unsigned getDollars(const char *price)
{
return(atoi(price));
}
We can use strchr() - which does not modify the string - to find the '.' and then process the text after it:
unsigned getCents(const char *price)
{
const char *dot = strchr(price, '.');
return((dot == 0) ? 0 : atoi(dot+1));
}
Quite a lot simpler, I think.
One more gotcha: suppose the string is 26.6; you are going to have to work harder than the revised getCents() just above does to get that to return 60 instead of 6. Also, given 26.650, it will return 650, not 65.

This:
char price[4] = "2.20";
leaves out the nul terminator on price. I think you want this:
char price[5] = "2.20";
or better:
char price[] = "2.20";
So, you will run off the end of the buffer the second time you try to get a token out of price. You're just getting lucky that getCents() doesn't segfault every time you run it.
And you should almost always make a copy of a string before using strtok on it (to avoid the problem that Jonathan Leffler pointed out).

Related

C - Segmentation fault using strtok

I have this code where its read multiple files and print a certain value. After reading files, at a certain moment my while loop stop and show a segmentation fault ...
Here is my code
int main () {
const char s[2] = ",";
const char s2[2] = ":";
char var1[] = "fiftyTwoWeekHigh\"";
char *fiftyhigh;
char *fiftyhigh2;
char *fiftyhigh_token;
char *fiftyhigh2_token;
char var2[] = "fiftyTwoWeekLow\"";
char *fiftylow;
char *fiftylow2;
char *fiftylow_token;
char *fiftylow2_token;
char var3[] = "regularMarketPrice\"";
char *price;
char *price2;
char *price_token;
char *price2_token;
FILE *fp;
char* data = "./data/";
char* json = ".json";
char line[MAX_LINES];
char line2[MAX_LINES];
int len;
char* fichier = "./data/indices.txt";
fp = fopen(fichier, "r");
if (fp == NULL){
printf("Impossible d'ouvrir le fichier %s", fichier);
return 1;
}
while (fgets(line, sizeof(line), fp) != NULL) {
char fname[10000];
len = strlen(line);
if (line[len-1] == '\n') {
line[len-1] = 0;
}
int ret = snprintf(fname, sizeof(fname), "%s%s%s", data, line, json);
if (ret < 0) {
abort();
}
printf("%s\n", fname);
FILE* f = fopen(fname, "r");
while ( fgets( line2, MAX_LINES, f ) != NULL ) {
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
price = strstr(line2, var3);
price_token = strtok(price, s);
price2 = strstr(price_token, s2);
price2_token = strtok(price2, s2);
printf("%s\n", price2_token);
//printf("\n%s\t%s\t%s\t%s\t%s", line, calculcx(fiftyhigh2_token, price2_token, fiftylow2_token), "DIV-1", price2_token, "test");
}
fclose(f);
}
fclose(fp);
return 0;
}
and the output is :
./data/k.json
13.59
5.31
8.7
./data/BCE.json
60.14
46.03
56.74
./data/BNS.json
80.16
46.38
78.73
./data/BLU.json
16.68
2.7
Segmentation fault
It is like my program stop because it can't reach a certain data at a certain file... Is there a way to allocate more memory ? Because my MAX_LINES is already set at 6000.
I'm assuming that the lines in your file look something like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }
In other words it's some kind of JSON format. I'm assuming that the line starts with '{' so each line is a JSON object.
You read that line into line2, which now contains:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }\0
Note the \0 at the end that terminates the string. Note also that "fiftyTwoWeekLow" comes first, which turns out to be really important.
Now let's trace through the code here:
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
First you call strstr to find the position of "fiftyTwoWeekHigh". This will return a pointer to the position of that field name in the line. Then you call strtok to find the comma that separates this value from the next. I think that this is where things start to go wrong. After the call to strtok, line2 looks like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100\0 ... }\0
Note that strtok has modified the string: the comma has been replaced with \0. That's so you can use the returned pointer fiftyhigh_token as a string without seeing all the stuff that came after the comma.
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
Next you look for the colon and then call strtok with a pointer to the colon. Since the delimiter you're passing to strok is the colon, strtok ignores the colon and returns the next token, which (because the string we're looking at, which ends after "100," has no more colons) is the rest of the string, in other words, the number.
So you've gotten your number, but probably not in the way you expected? There was really no point in the second call to strtok since (assuming the JSON was well-formed) the position of "100" was just fiftyhigh2+1.
Now we try to find "fiftyTwoWeekLow:"
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
This is basically the same process, and after you call strtok, line2 like this:
{"fiftyTwoWeekLow":32\0"fiftyTwoWeekHigh":100\0 ... }\0
Note that you're only able to find "fiftyTwoWeekLow" because it comes before "fiftyTwoWeekHigh" in the line. If it had come after, then you'd have been unable to find it due to the \0 added after "fiftyTwoWeekHigh" earlier. In that case, strstr would have returned NULL, which would cause strtok to return NULL, and then you'd definitely have gotten a seg fault after passing NULL to strstr.
So the code is really sensitive to the order in which the fields appear in the line, and it's probably failing because some of your lines have the fields in a different order. Or maybe some fields are just missing from some lines, which would have the same effect.
If you're parsing JSON, you should really use a library designed for that purpose. But if you really want to use strtok then you should:
Read line2.
Call strtok(line2, ",") once, then repeatedly call strtok(NULL, ",") in a loop until it returns null. This will break up the line into tokens that each look like "someField":100.
Isolate the field name and value from each of these tokens (just call strchr(token, ':') to find the value). Do not call strtok here, because it will change the internal state of strtok and you won't be able to use strtok(NULL, ",") to continue processing the line.
Test the field name, and depending on its value, set an appropriate variable. In other words, if it's the "fiftyTwoWeekLow" field, set a variable called fiftyTwoWeekLow. You don't have to bother to strip off the quotes, just include them in the string you're comparing with.
Once you've processed all the tokens (strtok returns NULL), do something with the variables you set.
You may be to pass ",{}" as the delimiter to strtok in order to get rid of any open and close curly braces that surround the line. Or you could look for them in each token and ignore them if they appear.
You could also pass "\"{},:" as the delimiter to strtok. This would cause strtok to emit an alternating sequence of field names and values. You could call strtok once to get the field name, again to get the value, then test the field name and do something with the value.
Using strtok is a pretty primitive way of parsing JSON, but it will will work as long as your JSON only contains simple field names and numbers and doesn't include any strings that themselves contain delimiter characters.
Did you mean '\0' ?
if (line[len-1] == '\n') {
line[len-1] = 0;
}
I advise you to use gdb to see where the segfault occurs and why.
I don't think you have to allocate much more memory. But the segfault may happens because you don't have anymore data and you still print the result.
Use if(price2_token!=NULL) printf("%s\n", price2_token); for example.

Nested strtok in c resulting in an infinite loop

I make user enter a username and I then go to this file and extract the values corresponding the particular user. I know the fault is with the way that I am using strtok as it only works for the first user.
Once I find the user, I want to stop searching in the file.
int fd;
fd=open(fileName,O_RDONLY,0744);
if (fd==-1)
{
printf("The file userDetails.txt failed to open.\n");
exit(1);
}
int fileSize = sizeof(fileOutput)/sizeof(fileOutput[0]); //size of file
printf("%d\n",fileSize);
int bytesRead = read(fd,&fileOutput,fileSize);
//TERMINATING BUFFER PROPERLY
fileOutput[bytesRead] = '\0';
printf("%s\n",fileOutput);
//READING LINE BY LINE IN FILE
char *line;
char *data;
char *name;
char *saltValue;
char *encryptedValue;
line = strtok(fileOutput,"\n"); //SPLIT ACCORDING TO LINE
while (line != NULL)
{
data = strtok(line, ":");
while (data != NULL)
{
name = data;
if (strcmp(name,userName)==0)
{
printf("%s\n","User exists");
saltValue = strtok(NULL,":");
printf("%s\n",saltValue);
encryptedValue = strtok(NULL, ":");
printf("%s\n",encryptedValue);
break;
}
else
{
break;
}
}
if (strcmp(name,userName)==0) //user found
{
break;
}
else //user not found
{
strtok(NULL,"\n");
}
}
If you are limited to read, that's fine, but you can only use strtok once on "\n" to parse each line from fileOutput, not nested again to parse the ':'. Otherwise, since strtok modifies the string by inserting '\0' at the delimiter found, you will be writing the nul-terminating character within lines that will cause the outer strtok to consider the string finished on the next iteration.
Instead, use a single pointer on each line with strchr (line, ':') to locate the first ':' with the line and then strncmp() using the pointer to the start of line and then pointer locating ':'. For example, if you have a function to check if the userName is contained in your file (returning 0 on success and 1 on failure) you could do:
...
for (char *line = strtok(fileOutput,"\n"); line; line = strtok (NULL, "\n"))
{
char *p = strchr (line, ':'); /* find first ':' */
if (!p) /* if not found, bail */
break;
if (strncmp (userName, line, p - line) == 0) { /* check name */
printf ("found user: %s hash: %s\n", userName, p+1);
return 0;
}
}
fputs ("user not found.\n", stdout);
return 1;
This is probably one of the simpler approaches you could take.
Strtok modifies its input string, which makes impossible to call it in nesting mode, the inner loop workings destroy the work of the outer strtok(), making it impossible to continue.
Appart of this, using strtok() in your problem is not adequate for another reason: if you try to use it to parse the /etc/passwd file (or one of such similar format files that we cope with today) you'll run in trouble with empty fields. In case you have an empty field (two consecutive : chars in sequence, strtok() will skip over both, skipping completely undetected the empty field) Strtok is an old, legacy function that was writen to cope with the three characters (\n\t) that are used to separate arguments in bourne shell. In the case of /etc/passwd you need to cope with possibly empty fields, and that makes it impossible to use strtok() to parse them.
You can easily use strchr() instead to search for the : of /etc/passwd in a non-skipping way, just write something like (you can encapsulate this in a function):
char *not_strtok_but_almost(char *s, char *delim)
{
static char *p = NULL; /* this makes the function non-reentrant, like strtok() */
char *saved = NULL;
if (s) {
p = s;
while (strchr(delim, *p)) /* while *p is in the delimiters, skip */
p++;
/* *p is not more in the delimiters. save as return value */
saved = p;
}
/* search for delimiters after value */
while (*p && !strchr(delim, *p)) /* while *p not null, and not in the delimiter set */
p++;
/* *p is at the end of the string or p points to one of the delimiters */
*p = '\0';
return saved;
}
This function has all the trouble of strtok(3) but you can use it (taking care of its non-reentrancy and that it modifies the source string, making it not nestable on several loops) because it doesn't skip all the separators in one shot, but just stops after the first separator found.
To solve the nesting problem, you can act in a different way, lets assume you have several identifiers separated by spaces (as in /etc/group file) which should require (it doesn't, as the names field is the last, you are not going to call strtok again on the first loop, but to get a NULL. You can process your file in a level first precedence, instead of a depth first precedence. You first seek all fields in the first level, and then go, field by field, reading its subfields (that will use a different delimiter probably)
As all of these modifications are all done in the same string, no need to allocate a buffer for each and strdup() the strings before use... the work can be done in the same file, and strdup()ing the string at the beginning if you need to store the different subfields.
Make any comments if you are in doubt with this (be careful as I have not tested the routine above, it can have probably a bug)

How to compare strings in C without hard coding the increments?

I have a buffer that holds a string from a CSV file that I opened and read. I split the string up by using strtok() and split on the " , ". So now my string looks like this:
char buff[BUFFER_SIZE] = "1000" "CAP_SETPCAP" "CAP_NET_RAW"
I want to make comparisons now for each section of the string, but for the life of me I cannot get it to work. I want to be able to do it without hard coding anything meaning I don't want to assume how many spaces I need to move over. For example to start at CAP_SETPCAP I don't want to have to put buff+5. Anybody know a better way to handle this?
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define BUFFER_SIZE 1024
int main(int argc, char *argv[]) {
FILE *fp = fopen("csvtest.csv", "r");
char buff[BUFFER_SIZE];
fgets(buff, 1024, fp);
char *csvData = strtok(buff, ",");
while(csvData != NULL){
csvData = strtok(NULL, ",");
}
int i;
while(buff[i] != '\0'){
strcmp(buff, "CAP_NET_RAW")
printf("Match found");
i++;
}
//or I wanted to do string comparison, but I kept getting
//segmentation fault (core dumped)
char *found;
found = strstr(buff, "CAP_NET_RAW");
printf("%s\n", found);
fclose(fp);
return 0;
}
Your code has three different sections. Lets analyze them:
1. The strtok section
You get the data from the file and then you iterate on strtok:
fgets(buff, 1024, fp);
char *csvData = strtok(buff, ",");
while(csvData != NULL){
csvData = strtok(NULL, ",");
}
You seem not interested in what you found in the different positions: in fact csvData is always overwritten with the last token. And at last it is equal to NULL.
The only thing you get is having the commas in the original array buff overwritten with '\0'. Printing buff you will only see "1000", because after this substring there is the string terminator placed by strtok.
2. Searching "CAP_NET_RAW"
You now iterate on buff[i] until the string terminator. But the string terminator is after the first substring "1000"!
int i;
while(buff[i] != '\0'){
strcmp(buff, "CAP_NET_RAW")
printf("Match found");
i++;
}
Furthermore you search for CAP_NET_RAW, but even without the inner-terminators-issue, the comparison would never succeed. That's because (1) the string actually present in buff is "CAP_NET_RAW" (with double quotes); (2) that token is the last of the row, an it sitll will have the trailing '\n' (fgets doesn't remove it).
By the way: I copied the code after your edit, and now there's no check on strcmp() return value. I suppose it is a typo. Note: strcmp returns 0 if the string match.
3. The strstr attempt
Finally you look for the string using the strstr function. That's a clever idea. But as already said before, buff doesn't contain it. Well, the buffer actually do contain it, but string utilities will stop at the first '\0' they found.
char *found;
found = strstr(buff, "CAP_NET_RAW");
printf("%s\n", found);
So found will be NULL, and dereferencing a NULL pointer (that's what %s tells printf to do) will lead to a segmentation fault.
4. Conclusions
As a very simple way to find the only string you care of, I suggest to use only strstr, without using strtok before. Alternatively you can still use strtok, but saving tokens in different strings so that you can access them later.

atoi() returning strange value

I'm getting some very strange behavior out of the atoi command. I am trying to find 2 values out of a range with the format [1:2]
The string being created is done with a dynamic string allocating macro (if Sasprintf throws you)
It will be read in from a file at projects end, however.
Anyway, I seem to be parsing my string correctly, given the correct values of token and token2. I'm confused, however, why calling atoi on token2 would give me a gibberish answer. Also, I found out in the midst of this that strtok is deprecated, I just haven't bothered switching it yet, until I solve this bug.
char *token;
char *token2;
int lsb = 0;
int msb = 0;
char *str = NULL;
Sasprintf(str,"[4:0]");
token = strtok(str,"[");
if(token != NULL)
{
token = strtok(token,":");
msb = atoi(token);
printf("%d\n", msb);
token2 = strtok(NULL,"]");
puts(token2);
lsb = atoi(token2);
printf("%d\n",token2);
}
OUTPUT
4
0
19853443
I think, you need to change
printf("%d\n",token2);
to
printf("%d\n",lsb);
token2 is a char * and you cannot print that using %d. Invokes undefined behavior
That said, always check the return value of strtok() against NULL. Also, strtod() is a better alternative to atoi().
printf("%d\n",token2);
This is not how you print a string, use:
printf("%s\n",token2);
or
printf("%d\n", lsb);
to print the result of your conversion.

Cannot concatenate strtok's output variable. strcat and strtok

I’ve spent hours on this program and have put several hours online searching for alternatives to my methods and have been plagued with crashes and errors all evening…
I have a few things I'd like to achieve with this code. First I’ll explain my problems, then I’ll post the code and finally I’ll explain my need for the program.
The program outputs just the single words and the concatenate function does nothing. This seems like it should be simple enough to fix...
My first problem is that I cannot seem to get the concatenate function to work, I used the generic strcat function which didn't work and neither did another version I found on the internet ( that function is used here, it is called "mystrcat" ). I want to have the program read in a string and remove "delimiters" to create a single string comprised of every word within the original string. I am trying to do with strtok and a strcat function. If there is an easier or simpler way PLEASE I’m all ears.
Another problem, which isn’t necessarily a problem but an ugly mess: the seven lines following main. I’d prefer to initialize my variables as follows: char variable[amt]; but the code I found for strtok was using a pointer and the code for the strcat function was using pointers. A better understanding of pointers && addresses for strings would probably help me out long term. However I would like to get rid of some of those lines by any means necessary. I can’t have 6 lines dedicated to only 2 variables. When I have 10 variables I do not want 30 lines up top…
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *mystrcat(char *output, char *firstptr);
int main() {
char str[] = "now # is the time for all # good men to come to the # aid of their country";
char delims[] = "# ";
char resultOrig[70]; //was [20];
char firstOrig[20];
//char *result = NULL, *first = NULL;
char result = resultOrig; //was result = &resultOrig;
char first = firstOrig; //was first = &firstOrig;
first = strtok( str, delims );
while( first != NULL ) {
mystrcat(resultOrig, firstOrig);
printf( "%s ", first );
printf("\n %s this should be the concat\'d string so far\n", resultOrig);
first = strtok( NULL, delims );
}
system("pause");
return 0;
}
char *mystrcat(char *resultptr, char *firstptr)
{
char *output = resultptr;
while (*output != '\0')
output++;
while(*firstptr != '\0')
{
*output = *firstptr;
output++;
firstptr++;
}
*output = '\0';
return output;
}
This is just a test program right now but I was intending to use this for a list/database of files. My files have underscores, hyphens, periods, parentheses’, and numbers; all of which I would like to set as the “delimiters”. I was planning on going thru a loop, where I would delete a delimiter(each loop-thru change from _ to – to . etc…) and create a single string, I may want to replace the delimiters with a space or a period. And some files have spaces in them already along with the special characters I’d like to “delimit”.
I’m planning to do all this by means of scanning a text file. Within the file I also have a size in this format: “2,518,6452”. I’m hoping I can sort my database alphabetically or by size, ascending or descending. That’s just some additional information which may be useful to know for my specific questions above.
Below I have included some fictional samples of how these names could appear.
my_file(2009).ext
second.File-group1.extls
the.third.file-vol30.lmth
I am focusing this post on: the question on how to get the concatenate function working or an alternative to strcat and/or strtok. As well as asking for help to unclutter unnecessary or redundant code.
I appreciate all the help and even all those who read through my post.
Thank you so much!
strcat would work if you used first instead of firstOrig in your loop. No need for mystrcat. Can be simplified to:
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "now # is the time for all # good men to come to the # aid of their country";
char delims[] = "# ";
char result[100] = ""; /* Original size was too small */
char* token;
token = strtok(str, delims);
while(token != NULL) {
printf("token = '%s'\n", token);
strcat(result, token);
token = strtok(NULL, delims);
}
printf("%s\n", result);
return 0;
}
Output:
token = 'now'
token = 'is'
token = 'the'
token = 'time'
token = 'for'
token = 'all'
token = 'good'
token = 'men'
token = 'to'
token = 'come'
token = 'to'
token = 'the'
token = 'aid'
token = 'of'
token = 'their'
token = 'country'
nowisthetimeforallgoodmentocometotheaidoftheircountry
There are several problems here:
Missing intitializations fro resultOrig and firstOrig (as codaddict pointed out).
first = &firstOrig doesn't do what you want from it. You later do first = strtok(str, delims), which sets first to point to somewhere in str. It doesn't read data into firstOrig.
You allocate small buffers (just 20 bytes) and try to fill them with much more than this. It would overflow the stack. causing nasty bugs.
You've not initialized the following two strings:
char resultOrig[20];
char firstOrig[20];
and you are appending characters to them. Change them to:
char resultOrig[20] = "";
char firstOrig[20] = "";
Also the name of the character array gives its starting address. So
result = &resultOrig;
first = &firstOrig;
should be:
result = resultOrig;
first = firstOrig;
Change
mystrcat(resultOrig, firstOrig);
to
mystrcat(resultOrig, first);
also make resultOrig to be large enough to hold the concatenations, like:
char resultOrig[100] = "";

Resources