Nested strtok in c resulting in an infinite loop - c

I make user enter a username and I then go to this file and extract the values corresponding the particular user. I know the fault is with the way that I am using strtok as it only works for the first user.
Once I find the user, I want to stop searching in the file.
int fd;
fd=open(fileName,O_RDONLY,0744);
if (fd==-1)
{
printf("The file userDetails.txt failed to open.\n");
exit(1);
}
int fileSize = sizeof(fileOutput)/sizeof(fileOutput[0]); //size of file
printf("%d\n",fileSize);
int bytesRead = read(fd,&fileOutput,fileSize);
//TERMINATING BUFFER PROPERLY
fileOutput[bytesRead] = '\0';
printf("%s\n",fileOutput);
//READING LINE BY LINE IN FILE
char *line;
char *data;
char *name;
char *saltValue;
char *encryptedValue;
line = strtok(fileOutput,"\n"); //SPLIT ACCORDING TO LINE
while (line != NULL)
{
data = strtok(line, ":");
while (data != NULL)
{
name = data;
if (strcmp(name,userName)==0)
{
printf("%s\n","User exists");
saltValue = strtok(NULL,":");
printf("%s\n",saltValue);
encryptedValue = strtok(NULL, ":");
printf("%s\n",encryptedValue);
break;
}
else
{
break;
}
}
if (strcmp(name,userName)==0) //user found
{
break;
}
else //user not found
{
strtok(NULL,"\n");
}
}

If you are limited to read, that's fine, but you can only use strtok once on "\n" to parse each line from fileOutput, not nested again to parse the ':'. Otherwise, since strtok modifies the string by inserting '\0' at the delimiter found, you will be writing the nul-terminating character within lines that will cause the outer strtok to consider the string finished on the next iteration.
Instead, use a single pointer on each line with strchr (line, ':') to locate the first ':' with the line and then strncmp() using the pointer to the start of line and then pointer locating ':'. For example, if you have a function to check if the userName is contained in your file (returning 0 on success and 1 on failure) you could do:
...
for (char *line = strtok(fileOutput,"\n"); line; line = strtok (NULL, "\n"))
{
char *p = strchr (line, ':'); /* find first ':' */
if (!p) /* if not found, bail */
break;
if (strncmp (userName, line, p - line) == 0) { /* check name */
printf ("found user: %s hash: %s\n", userName, p+1);
return 0;
}
}
fputs ("user not found.\n", stdout);
return 1;
This is probably one of the simpler approaches you could take.

Strtok modifies its input string, which makes impossible to call it in nesting mode, the inner loop workings destroy the work of the outer strtok(), making it impossible to continue.
Appart of this, using strtok() in your problem is not adequate for another reason: if you try to use it to parse the /etc/passwd file (or one of such similar format files that we cope with today) you'll run in trouble with empty fields. In case you have an empty field (two consecutive : chars in sequence, strtok() will skip over both, skipping completely undetected the empty field) Strtok is an old, legacy function that was writen to cope with the three characters (\n\t) that are used to separate arguments in bourne shell. In the case of /etc/passwd you need to cope with possibly empty fields, and that makes it impossible to use strtok() to parse them.
You can easily use strchr() instead to search for the : of /etc/passwd in a non-skipping way, just write something like (you can encapsulate this in a function):
char *not_strtok_but_almost(char *s, char *delim)
{
static char *p = NULL; /* this makes the function non-reentrant, like strtok() */
char *saved = NULL;
if (s) {
p = s;
while (strchr(delim, *p)) /* while *p is in the delimiters, skip */
p++;
/* *p is not more in the delimiters. save as return value */
saved = p;
}
/* search for delimiters after value */
while (*p && !strchr(delim, *p)) /* while *p not null, and not in the delimiter set */
p++;
/* *p is at the end of the string or p points to one of the delimiters */
*p = '\0';
return saved;
}
This function has all the trouble of strtok(3) but you can use it (taking care of its non-reentrancy and that it modifies the source string, making it not nestable on several loops) because it doesn't skip all the separators in one shot, but just stops after the first separator found.
To solve the nesting problem, you can act in a different way, lets assume you have several identifiers separated by spaces (as in /etc/group file) which should require (it doesn't, as the names field is the last, you are not going to call strtok again on the first loop, but to get a NULL. You can process your file in a level first precedence, instead of a depth first precedence. You first seek all fields in the first level, and then go, field by field, reading its subfields (that will use a different delimiter probably)
As all of these modifications are all done in the same string, no need to allocate a buffer for each and strdup() the strings before use... the work can be done in the same file, and strdup()ing the string at the beginning if you need to store the different subfields.
Make any comments if you are in doubt with this (be careful as I have not tested the routine above, it can have probably a bug)

Related

C - Segmentation fault using strtok

I have this code where its read multiple files and print a certain value. After reading files, at a certain moment my while loop stop and show a segmentation fault ...
Here is my code
int main () {
const char s[2] = ",";
const char s2[2] = ":";
char var1[] = "fiftyTwoWeekHigh\"";
char *fiftyhigh;
char *fiftyhigh2;
char *fiftyhigh_token;
char *fiftyhigh2_token;
char var2[] = "fiftyTwoWeekLow\"";
char *fiftylow;
char *fiftylow2;
char *fiftylow_token;
char *fiftylow2_token;
char var3[] = "regularMarketPrice\"";
char *price;
char *price2;
char *price_token;
char *price2_token;
FILE *fp;
char* data = "./data/";
char* json = ".json";
char line[MAX_LINES];
char line2[MAX_LINES];
int len;
char* fichier = "./data/indices.txt";
fp = fopen(fichier, "r");
if (fp == NULL){
printf("Impossible d'ouvrir le fichier %s", fichier);
return 1;
}
while (fgets(line, sizeof(line), fp) != NULL) {
char fname[10000];
len = strlen(line);
if (line[len-1] == '\n') {
line[len-1] = 0;
}
int ret = snprintf(fname, sizeof(fname), "%s%s%s", data, line, json);
if (ret < 0) {
abort();
}
printf("%s\n", fname);
FILE* f = fopen(fname, "r");
while ( fgets( line2, MAX_LINES, f ) != NULL ) {
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
price = strstr(line2, var3);
price_token = strtok(price, s);
price2 = strstr(price_token, s2);
price2_token = strtok(price2, s2);
printf("%s\n", price2_token);
//printf("\n%s\t%s\t%s\t%s\t%s", line, calculcx(fiftyhigh2_token, price2_token, fiftylow2_token), "DIV-1", price2_token, "test");
}
fclose(f);
}
fclose(fp);
return 0;
}
and the output is :
./data/k.json
13.59
5.31
8.7
./data/BCE.json
60.14
46.03
56.74
./data/BNS.json
80.16
46.38
78.73
./data/BLU.json
16.68
2.7
Segmentation fault
It is like my program stop because it can't reach a certain data at a certain file... Is there a way to allocate more memory ? Because my MAX_LINES is already set at 6000.
I'm assuming that the lines in your file look something like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }
In other words it's some kind of JSON format. I'm assuming that the line starts with '{' so each line is a JSON object.
You read that line into line2, which now contains:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }\0
Note the \0 at the end that terminates the string. Note also that "fiftyTwoWeekLow" comes first, which turns out to be really important.
Now let's trace through the code here:
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
First you call strstr to find the position of "fiftyTwoWeekHigh". This will return a pointer to the position of that field name in the line. Then you call strtok to find the comma that separates this value from the next. I think that this is where things start to go wrong. After the call to strtok, line2 looks like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100\0 ... }\0
Note that strtok has modified the string: the comma has been replaced with \0. That's so you can use the returned pointer fiftyhigh_token as a string without seeing all the stuff that came after the comma.
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
Next you look for the colon and then call strtok with a pointer to the colon. Since the delimiter you're passing to strok is the colon, strtok ignores the colon and returns the next token, which (because the string we're looking at, which ends after "100," has no more colons) is the rest of the string, in other words, the number.
So you've gotten your number, but probably not in the way you expected? There was really no point in the second call to strtok since (assuming the JSON was well-formed) the position of "100" was just fiftyhigh2+1.
Now we try to find "fiftyTwoWeekLow:"
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
This is basically the same process, and after you call strtok, line2 like this:
{"fiftyTwoWeekLow":32\0"fiftyTwoWeekHigh":100\0 ... }\0
Note that you're only able to find "fiftyTwoWeekLow" because it comes before "fiftyTwoWeekHigh" in the line. If it had come after, then you'd have been unable to find it due to the \0 added after "fiftyTwoWeekHigh" earlier. In that case, strstr would have returned NULL, which would cause strtok to return NULL, and then you'd definitely have gotten a seg fault after passing NULL to strstr.
So the code is really sensitive to the order in which the fields appear in the line, and it's probably failing because some of your lines have the fields in a different order. Or maybe some fields are just missing from some lines, which would have the same effect.
If you're parsing JSON, you should really use a library designed for that purpose. But if you really want to use strtok then you should:
Read line2.
Call strtok(line2, ",") once, then repeatedly call strtok(NULL, ",") in a loop until it returns null. This will break up the line into tokens that each look like "someField":100.
Isolate the field name and value from each of these tokens (just call strchr(token, ':') to find the value). Do not call strtok here, because it will change the internal state of strtok and you won't be able to use strtok(NULL, ",") to continue processing the line.
Test the field name, and depending on its value, set an appropriate variable. In other words, if it's the "fiftyTwoWeekLow" field, set a variable called fiftyTwoWeekLow. You don't have to bother to strip off the quotes, just include them in the string you're comparing with.
Once you've processed all the tokens (strtok returns NULL), do something with the variables you set.
You may be to pass ",{}" as the delimiter to strtok in order to get rid of any open and close curly braces that surround the line. Or you could look for them in each token and ignore them if they appear.
You could also pass "\"{},:" as the delimiter to strtok. This would cause strtok to emit an alternating sequence of field names and values. You could call strtok once to get the field name, again to get the value, then test the field name and do something with the value.
Using strtok is a pretty primitive way of parsing JSON, but it will will work as long as your JSON only contains simple field names and numbers and doesn't include any strings that themselves contain delimiter characters.
Did you mean '\0' ?
if (line[len-1] == '\n') {
line[len-1] = 0;
}
I advise you to use gdb to see where the segfault occurs and why.
I don't think you have to allocate much more memory. But the segfault may happens because you don't have anymore data and you still print the result.
Use if(price2_token!=NULL) printf("%s\n", price2_token); for example.

How can I make strtok include newlines at the end of a token?

In a program I am writing, I need to be able to tokenize a input text file into words, do some encoding, and then write to an output file. Problem is, I need to preserve the new lines.
The approach I was trying is to have strtok preserve the newlines at the end of a word, however, strtok will only include one newline character before moving on. If there is a following newline, it becomes its own token. How can I change this behavior so that tokens include all newlines before moving onto the next word?
int changeNewLine(char* p) {
p = p + (strlen(p)-1);
int newlines = 0;
while(*p == '\n') {
*p = '\0';
newlines++;
p--;
}
return newlines;
}
void main(int argc, char *argv[]) {
FILE *inputfile = fopen(argv[1],"rw");
FILE *outputfile = fopen("output.txt","wb");
char buffer[128];
char *token;
char words[MAX_CODE][WORDLEN];
int i = 0;
unsigned short newlines[MAX_CODE];
while(fgets(buffer, 128, inputfile)){
token = strtok(buffer," ");
while(token != NULL) {
newlines[i] = changeNewLine(token);
strcpy(words[i], token);
i++;
token = strtok(NULL," ");
}
}
...
}
Above is a fragment of my code. The idea is to count the number of newlines in a token, and then write them back out later.
strtok already does include newlines in the token, since you are using a delimiter string that does not contain the newline. But in your program as it now is, you will never have more than one in a token because fgets reads (at most) one line at a time. That's its whole purpose. It will never give you a string containing two or more newlines, nor containing a newline anywhere other than the last character.
Your general alternatives are
to look ahead at subsequent lines in order to spot additional newlines, or
retrospectively update the previous line's newline count when encounter a line starting with a newline (and, therefore, containing nothing else).
Alternative (1) could include employing an altogether different approach to reading input, too, such as a block read with fread() or a character-at-a-time read with fgetc().

Split string using more than one char as delimeter

Let's say I have a string "file1.h: file2.c,file3.cpp" and I want to split it into "file1.h" and "file2.c,file3.cpp" - that is using : (: and whitespace) as delimiter. How can I do it?
I tried this code with no help:
int main(int argc, char *argv[]) {
char str[] = "file1.h: file2.c,file3.cpp";
char name[100];
char depends[100];
sscanf(str, "%s: %s", name, depends);
printf("Name: %s\n", name);
printf("Deps: %s\n", depends);
}
And the output I get is:
Name: file1.h:
Deps:
What you seem to need is strtok(). Read about it in the man page. Related quote from C11, chapter ยง7.24.5.8
A sequence of calls to the strtok function breaks the string pointed to by s1 into a
sequence of tokens, each of which is delimited by a character from the string pointed to
by s2. [...]
In your case, you can use a delimiter like
char * delim = ": "; //combination of : and a space
go get the job done.
Things to mention additionally,
the input needs to be modifiable (which is, in your case) for strtok()
and it actually destroys the input fed to it, keep a copy around if you need the actual later.
This is an alternative way to do it, it uses strchr(), but this assumes that the input string always has the format
name: item1,item2,item3,...,itemN
Here is the program
#include <string.h>
#include <stdio.h>
int
main(void)
{
const char *const string = "file1.h: file2.c,file3.cpp ";
const char *head;
const char *tail;
const char *next;
// This basically makes a pointer to the `:'
head = string;
// If there is no `:' this string does not follow
// the assumption that the format is
//
// name: item1,item2,item3,...,itemN
//
if ((tail = strchr(head, ':')) == NULL)
return -1;
// Save a pointer to the next character after the `:'
next = tail + 1;
// Strip leading spaces
while (isspace((unsigned char) *head) != 0)
++head;
// Strip trailing spaces
while (isspace((unsigned char) *(tail - 1)) != 0)
--tail;
fputc('*', stdout);
// Simply print the characters between `head' and `tail'
// you could as well copy them, or whatever
fwrite(head, 1, tail - head, stdout);
fputc('*', stdout);
fputc('\n', stdout);
head = next;
while (head != NULL) {
tail = strchr(head, ',');
if (tail == NULL) {
// This means there are no more `,'
// so we now try to point to the end
// of the string
tail = strchr(head, '\0');
}
// This is basically the same algorithm
// just with a different delimiter which
// will presumably be the same from
// here
next = tail + 1;
// Strip leading spaces
while (isspace((unsigned char) *head) != 0)
++head;
// Strip trailing spaces
while (isspace((unsigned char) *(tail - 1)) != 0)
--tail;
// Here is where you can extract the string
// I print it surrounded by `*' to show that
// it's stripping white spaces
fputc('*', stdout);
fwrite(head, 1, tail - head, stdout);
fputc('*', stdout);
fputc('\n', stdout);
// Try to point to the next one
// or make head `NULL' if this is
// the end of the string
//
// Note that the original `tail' pointer
// that was pointing to the next `,' or
// the end of the string, has changed but
// we have saved it's original value
// plus one, we now inspect what was
// there
if (*(next - 1) == '\0') {
head = NULL;
} else {
head = next;
}
}
fputc('\n', stderr);
return 0;
}
It's excessively commented to guide the reader.
As Sourav says, you really need to use strtok for tokenizing strings. But this doesn't explain why your existing code is not working.
The answer lies in the specification for sscanf and how it handles a '%s' in the format string.
From the man page:
s Matches a sequence of non-white-space characters;
So, the presence of a colon-space in your format string is largely irrelevant for mathcing the first '%s'. When sscanf sees the first %s it simply consumes the input string until a whitespace character is encountered, giving you your value for name of "file1.h:" (note the inclusion of the colon).
Next it tries to deal with the colon-space sequence in your format string.
Again, from the man page
The format string consists of a sequence of directives which describe how to process the sequence of input characters.
The colon-space sequence does not match any known directive (i.e. "%" followed by something) and thus you get a matching failure.
If, instead, your format string was simply "%s%s", then sscanf will get you almost exactly what you want.
int main(int argc, char *argv[]) {
char str[] = "file1.h: file2.c,file3.cpp";
char name[100];
char depends[100];
sscanf(str, "%s%s", name, depends);
printf("str: '%s'\n", str);
printf("Name: %s\n", name);
printf("Deps: %s\n", depends);
return 0;
}
Which gives this output:
str: 'file1.h: file2.c,file3.cpp'
Name: file1.h:
Deps: file2.c,file3.cpp
At this point, you can simply check that sscanf gave a return value of 2 (i.e. it found two values), and that the last character of name is a colon. Then just truncate name and you have your answer.
Of course, by this logic, you aren't going to be able to use sscanf to parse your depends variable into multiple strings ... which is why others are recommending using strtok, strpbrk etc because you are both parsing and tokenizing your input.
Well, I am pretty late. I do not have much knowledge on inbuilt functions in C. So I started writing a solution for you. I don't think you need this now. But, anyway here it is and modify it as per your need. If you find any bug feel free to tell.

How to read a text file into matrix form in C

Given the following text file with the following content in it
SpotA B C
SpotB pass D
Spotc A E F
How to do I break up the words into tokens and store them in a 10 x 10 matrix.
Note that if the content in the file is a matrix size with smaller than 10 x 10, I want to add the character ~ to those positions.
So far this is my code:
char *matrix[10][10];
int loadFileToMatrix(char *filename){
FILE *fp;
int row = 0;
int col= 0;
char *tokens;
char buffer[1000];
fp = fopen(filename,"r");
if(fp == NULL){
perror(filename);
return(1);
}
while((fgets(buffer, sizeof(buffer), fp))!= NULL) {
tokens = strtok(buffer," ");
map[row++][col++] = tokens;
}
return(0);
}
If some can help me figure out how to achieve my goal that would be nice. Currently, I am really confused on how to proceed.
Just use fscanf to read tokens from file to buffer, then copy tokens into your the matrix map. You can use fgetc to detect if it reaches the end of line and the end of file.
char ch;
while (1) {
fscanf(fp, "%s", buffer);
matrix[row][col] = (char *)malloc(sizeof(char) * (strlen(buffer) + 1));
strcpy(matrix[row][col], buffer);
ch = fgetc(fp);
if (ch == ' ') {
col += 1;
}
else if (ch == '\n') {
row += 1;
col = 0;
}
else if (ch == EOF) {
break; // end of file.
}
}
strtok() is a weird function.
The key part of the man page is this:
"On the first call to strtok() the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str should be NULL."
The reason for this is that strtok() alters the string you pass it. It searches through a string until it finds the next character that matches one of the delimiters, and then replaces that delimiter with a null terminator. If the delimiter is found at position n, internally, strtok() saves the position n+1 as the start of the rest of the string.
By calling strtok a second time with a non-null value, you are telling the function to start all over again at the start of that string, and try again to find a delimiter -- which it can never do, because it already found the first one. Instead, your second call to strtok() should pass NULL as the first argument, so each pass can bring out the next token.
If for some reason you need to call strtok() on multiple strings simultaneously, you will overwrite the internally-saved address; only the most recent call is saved properly. The reentrant function strtok_r() is useful in that situation.
If you're ever not sure how to use a function, the man pages are the best resource. You can type man strtok at the command line, or even just google it.
It looks like, in this case, you're using strtok() only once. This will just return the address of the first piece of the buffer, delimited by your delimiters. You need to call strtok() in a loop to get each piece in turn.

How I can skip a blank line in an input file when using strtok?

I want to pass lines of a file using strtok; the values are comma separated. However, strtok also reads blank lines which only contain spaces. Isn't it suppose to return a null pointer in such a situation?
How can I ignore such a line? I tried to check NULL, but as mentioned above it doesn't work.
void function_name(void)
{
const char delimiter[] = ",";
char line_read[9000];
char keep_me[9000];
int i = 0;
while(fgets(line_read, sizeof(line_read), filename) != NULL)
{
/*
* Check if the line read in contains anything
*/
if(line_read != NULL){
keep_me[i] = strtok(line_read, delimiter);
i++;
}
}
}
So to explain.
You're reading in your file using a while loop which reads the entire file line by line (fgets) into the array line_read.
Every time it reads in a line it will check to see if it contains anything (the NULL check).
If it does contain something it was parse it using strtok and read it into keep_me otherwise it will stay in the line_read array which you obviously don't use in your program.

Resources