How to compare strings in C without hard coding the increments? - c

I have a buffer that holds a string from a CSV file that I opened and read. I split the string up by using strtok() and split on the " , ". So now my string looks like this:
char buff[BUFFER_SIZE] = "1000" "CAP_SETPCAP" "CAP_NET_RAW"
I want to make comparisons now for each section of the string, but for the life of me I cannot get it to work. I want to be able to do it without hard coding anything meaning I don't want to assume how many spaces I need to move over. For example to start at CAP_SETPCAP I don't want to have to put buff+5. Anybody know a better way to handle this?
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define BUFFER_SIZE 1024
int main(int argc, char *argv[]) {
FILE *fp = fopen("csvtest.csv", "r");
char buff[BUFFER_SIZE];
fgets(buff, 1024, fp);
char *csvData = strtok(buff, ",");
while(csvData != NULL){
csvData = strtok(NULL, ",");
}
int i;
while(buff[i] != '\0'){
strcmp(buff, "CAP_NET_RAW")
printf("Match found");
i++;
}
//or I wanted to do string comparison, but I kept getting
//segmentation fault (core dumped)
char *found;
found = strstr(buff, "CAP_NET_RAW");
printf("%s\n", found);
fclose(fp);
return 0;
}

Your code has three different sections. Lets analyze them:
1. The strtok section
You get the data from the file and then you iterate on strtok:
fgets(buff, 1024, fp);
char *csvData = strtok(buff, ",");
while(csvData != NULL){
csvData = strtok(NULL, ",");
}
You seem not interested in what you found in the different positions: in fact csvData is always overwritten with the last token. And at last it is equal to NULL.
The only thing you get is having the commas in the original array buff overwritten with '\0'. Printing buff you will only see "1000", because after this substring there is the string terminator placed by strtok.
2. Searching "CAP_NET_RAW"
You now iterate on buff[i] until the string terminator. But the string terminator is after the first substring "1000"!
int i;
while(buff[i] != '\0'){
strcmp(buff, "CAP_NET_RAW")
printf("Match found");
i++;
}
Furthermore you search for CAP_NET_RAW, but even without the inner-terminators-issue, the comparison would never succeed. That's because (1) the string actually present in buff is "CAP_NET_RAW" (with double quotes); (2) that token is the last of the row, an it sitll will have the trailing '\n' (fgets doesn't remove it).
By the way: I copied the code after your edit, and now there's no check on strcmp() return value. I suppose it is a typo. Note: strcmp returns 0 if the string match.
3. The strstr attempt
Finally you look for the string using the strstr function. That's a clever idea. But as already said before, buff doesn't contain it. Well, the buffer actually do contain it, but string utilities will stop at the first '\0' they found.
char *found;
found = strstr(buff, "CAP_NET_RAW");
printf("%s\n", found);
So found will be NULL, and dereferencing a NULL pointer (that's what %s tells printf to do) will lead to a segmentation fault.
4. Conclusions
As a very simple way to find the only string you care of, I suggest to use only strstr, without using strtok before. Alternatively you can still use strtok, but saving tokens in different strings so that you can access them later.

Related

C - Segmentation fault using strtok

I have this code where its read multiple files and print a certain value. After reading files, at a certain moment my while loop stop and show a segmentation fault ...
Here is my code
int main () {
const char s[2] = ",";
const char s2[2] = ":";
char var1[] = "fiftyTwoWeekHigh\"";
char *fiftyhigh;
char *fiftyhigh2;
char *fiftyhigh_token;
char *fiftyhigh2_token;
char var2[] = "fiftyTwoWeekLow\"";
char *fiftylow;
char *fiftylow2;
char *fiftylow_token;
char *fiftylow2_token;
char var3[] = "regularMarketPrice\"";
char *price;
char *price2;
char *price_token;
char *price2_token;
FILE *fp;
char* data = "./data/";
char* json = ".json";
char line[MAX_LINES];
char line2[MAX_LINES];
int len;
char* fichier = "./data/indices.txt";
fp = fopen(fichier, "r");
if (fp == NULL){
printf("Impossible d'ouvrir le fichier %s", fichier);
return 1;
}
while (fgets(line, sizeof(line), fp) != NULL) {
char fname[10000];
len = strlen(line);
if (line[len-1] == '\n') {
line[len-1] = 0;
}
int ret = snprintf(fname, sizeof(fname), "%s%s%s", data, line, json);
if (ret < 0) {
abort();
}
printf("%s\n", fname);
FILE* f = fopen(fname, "r");
while ( fgets( line2, MAX_LINES, f ) != NULL ) {
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
price = strstr(line2, var3);
price_token = strtok(price, s);
price2 = strstr(price_token, s2);
price2_token = strtok(price2, s2);
printf("%s\n", price2_token);
//printf("\n%s\t%s\t%s\t%s\t%s", line, calculcx(fiftyhigh2_token, price2_token, fiftylow2_token), "DIV-1", price2_token, "test");
}
fclose(f);
}
fclose(fp);
return 0;
}
and the output is :
./data/k.json
13.59
5.31
8.7
./data/BCE.json
60.14
46.03
56.74
./data/BNS.json
80.16
46.38
78.73
./data/BLU.json
16.68
2.7
Segmentation fault
It is like my program stop because it can't reach a certain data at a certain file... Is there a way to allocate more memory ? Because my MAX_LINES is already set at 6000.
I'm assuming that the lines in your file look something like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }
In other words it's some kind of JSON format. I'm assuming that the line starts with '{' so each line is a JSON object.
You read that line into line2, which now contains:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }\0
Note the \0 at the end that terminates the string. Note also that "fiftyTwoWeekLow" comes first, which turns out to be really important.
Now let's trace through the code here:
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
First you call strstr to find the position of "fiftyTwoWeekHigh". This will return a pointer to the position of that field name in the line. Then you call strtok to find the comma that separates this value from the next. I think that this is where things start to go wrong. After the call to strtok, line2 looks like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100\0 ... }\0
Note that strtok has modified the string: the comma has been replaced with \0. That's so you can use the returned pointer fiftyhigh_token as a string without seeing all the stuff that came after the comma.
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
Next you look for the colon and then call strtok with a pointer to the colon. Since the delimiter you're passing to strok is the colon, strtok ignores the colon and returns the next token, which (because the string we're looking at, which ends after "100," has no more colons) is the rest of the string, in other words, the number.
So you've gotten your number, but probably not in the way you expected? There was really no point in the second call to strtok since (assuming the JSON was well-formed) the position of "100" was just fiftyhigh2+1.
Now we try to find "fiftyTwoWeekLow:"
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
This is basically the same process, and after you call strtok, line2 like this:
{"fiftyTwoWeekLow":32\0"fiftyTwoWeekHigh":100\0 ... }\0
Note that you're only able to find "fiftyTwoWeekLow" because it comes before "fiftyTwoWeekHigh" in the line. If it had come after, then you'd have been unable to find it due to the \0 added after "fiftyTwoWeekHigh" earlier. In that case, strstr would have returned NULL, which would cause strtok to return NULL, and then you'd definitely have gotten a seg fault after passing NULL to strstr.
So the code is really sensitive to the order in which the fields appear in the line, and it's probably failing because some of your lines have the fields in a different order. Or maybe some fields are just missing from some lines, which would have the same effect.
If you're parsing JSON, you should really use a library designed for that purpose. But if you really want to use strtok then you should:
Read line2.
Call strtok(line2, ",") once, then repeatedly call strtok(NULL, ",") in a loop until it returns null. This will break up the line into tokens that each look like "someField":100.
Isolate the field name and value from each of these tokens (just call strchr(token, ':') to find the value). Do not call strtok here, because it will change the internal state of strtok and you won't be able to use strtok(NULL, ",") to continue processing the line.
Test the field name, and depending on its value, set an appropriate variable. In other words, if it's the "fiftyTwoWeekLow" field, set a variable called fiftyTwoWeekLow. You don't have to bother to strip off the quotes, just include them in the string you're comparing with.
Once you've processed all the tokens (strtok returns NULL), do something with the variables you set.
You may be to pass ",{}" as the delimiter to strtok in order to get rid of any open and close curly braces that surround the line. Or you could look for them in each token and ignore them if they appear.
You could also pass "\"{},:" as the delimiter to strtok. This would cause strtok to emit an alternating sequence of field names and values. You could call strtok once to get the field name, again to get the value, then test the field name and do something with the value.
Using strtok is a pretty primitive way of parsing JSON, but it will will work as long as your JSON only contains simple field names and numbers and doesn't include any strings that themselves contain delimiter characters.
Did you mean '\0' ?
if (line[len-1] == '\n') {
line[len-1] = 0;
}
I advise you to use gdb to see where the segfault occurs and why.
I don't think you have to allocate much more memory. But the segfault may happens because you don't have anymore data and you still print the result.
Use if(price2_token!=NULL) printf("%s\n", price2_token); for example.

Nested strtok in c resulting in an infinite loop

I make user enter a username and I then go to this file and extract the values corresponding the particular user. I know the fault is with the way that I am using strtok as it only works for the first user.
Once I find the user, I want to stop searching in the file.
int fd;
fd=open(fileName,O_RDONLY,0744);
if (fd==-1)
{
printf("The file userDetails.txt failed to open.\n");
exit(1);
}
int fileSize = sizeof(fileOutput)/sizeof(fileOutput[0]); //size of file
printf("%d\n",fileSize);
int bytesRead = read(fd,&fileOutput,fileSize);
//TERMINATING BUFFER PROPERLY
fileOutput[bytesRead] = '\0';
printf("%s\n",fileOutput);
//READING LINE BY LINE IN FILE
char *line;
char *data;
char *name;
char *saltValue;
char *encryptedValue;
line = strtok(fileOutput,"\n"); //SPLIT ACCORDING TO LINE
while (line != NULL)
{
data = strtok(line, ":");
while (data != NULL)
{
name = data;
if (strcmp(name,userName)==0)
{
printf("%s\n","User exists");
saltValue = strtok(NULL,":");
printf("%s\n",saltValue);
encryptedValue = strtok(NULL, ":");
printf("%s\n",encryptedValue);
break;
}
else
{
break;
}
}
if (strcmp(name,userName)==0) //user found
{
break;
}
else //user not found
{
strtok(NULL,"\n");
}
}
If you are limited to read, that's fine, but you can only use strtok once on "\n" to parse each line from fileOutput, not nested again to parse the ':'. Otherwise, since strtok modifies the string by inserting '\0' at the delimiter found, you will be writing the nul-terminating character within lines that will cause the outer strtok to consider the string finished on the next iteration.
Instead, use a single pointer on each line with strchr (line, ':') to locate the first ':' with the line and then strncmp() using the pointer to the start of line and then pointer locating ':'. For example, if you have a function to check if the userName is contained in your file (returning 0 on success and 1 on failure) you could do:
...
for (char *line = strtok(fileOutput,"\n"); line; line = strtok (NULL, "\n"))
{
char *p = strchr (line, ':'); /* find first ':' */
if (!p) /* if not found, bail */
break;
if (strncmp (userName, line, p - line) == 0) { /* check name */
printf ("found user: %s hash: %s\n", userName, p+1);
return 0;
}
}
fputs ("user not found.\n", stdout);
return 1;
This is probably one of the simpler approaches you could take.
Strtok modifies its input string, which makes impossible to call it in nesting mode, the inner loop workings destroy the work of the outer strtok(), making it impossible to continue.
Appart of this, using strtok() in your problem is not adequate for another reason: if you try to use it to parse the /etc/passwd file (or one of such similar format files that we cope with today) you'll run in trouble with empty fields. In case you have an empty field (two consecutive : chars in sequence, strtok() will skip over both, skipping completely undetected the empty field) Strtok is an old, legacy function that was writen to cope with the three characters (\n\t) that are used to separate arguments in bourne shell. In the case of /etc/passwd you need to cope with possibly empty fields, and that makes it impossible to use strtok() to parse them.
You can easily use strchr() instead to search for the : of /etc/passwd in a non-skipping way, just write something like (you can encapsulate this in a function):
char *not_strtok_but_almost(char *s, char *delim)
{
static char *p = NULL; /* this makes the function non-reentrant, like strtok() */
char *saved = NULL;
if (s) {
p = s;
while (strchr(delim, *p)) /* while *p is in the delimiters, skip */
p++;
/* *p is not more in the delimiters. save as return value */
saved = p;
}
/* search for delimiters after value */
while (*p && !strchr(delim, *p)) /* while *p not null, and not in the delimiter set */
p++;
/* *p is at the end of the string or p points to one of the delimiters */
*p = '\0';
return saved;
}
This function has all the trouble of strtok(3) but you can use it (taking care of its non-reentrancy and that it modifies the source string, making it not nestable on several loops) because it doesn't skip all the separators in one shot, but just stops after the first separator found.
To solve the nesting problem, you can act in a different way, lets assume you have several identifiers separated by spaces (as in /etc/group file) which should require (it doesn't, as the names field is the last, you are not going to call strtok again on the first loop, but to get a NULL. You can process your file in a level first precedence, instead of a depth first precedence. You first seek all fields in the first level, and then go, field by field, reading its subfields (that will use a different delimiter probably)
As all of these modifications are all done in the same string, no need to allocate a buffer for each and strdup() the strings before use... the work can be done in the same file, and strdup()ing the string at the beginning if you need to store the different subfields.
Make any comments if you are in doubt with this (be careful as I have not tested the routine above, it can have probably a bug)

strtok() C-Strings to Array

Currently learning C, Having some trouble with passing c-string tokens into array. Lines come in by standard input, strtok is used to split the line up, and I want to put each into an array properly. an EOF check is required for exiting the input stream. Here's what I have, set up so that it will print the tokens back to me (these tokens will be converted to ASCII in a different code segment, just trying to get this part to work first).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char string[1024]; //Initialize a char array of 1024 (input limit)
char *token;
char *token_arr[1024]; //array to store tokens.
char *out; //used
int count = 0;
while(fgets(string, 1023, stdin) != NULL) //Read lines from standard input until EOF is detected.
{
if (count == 0)
token = strtok(string, " \n"); //If first loop, Get the first token of current input
while (token != NULL) //read tokens into the array and increment the counter until all tokens are stored
{
token_arr[count] = token;
count++;
token = strtok(NULL, " \n");
}
}
for (int i = 0; i < count; i++)
printf("%s\n", token_arr[i]);
return 0;
}
this seems like proper logic to me, but then i'm still learning. The issue seems to be with streaming in multiple lines before sending the EOF signal with ctrl-D.
For example, given an input of:
this line will be fine
the program returns:
this
line
will
be
fine
But if given:
none of this
is going to work
It returns:
is going to work
ing to work
to work
any help is greatly appreciated. I'll keep working at it in the meantime.
There are a couple of issues here:
You never call token = strtok(string, " \n"); again once the string is "reset" to a new value, so strtok() still thinks it is tokenizing your original string.
strtok is returning pointers to "substrings" inside string. You are changing the contents of what is in string and so your second line effectively corrupts your first (since the original contents of string are overwritten).
To do what you want you need to either read each line into a different buffer or duplicate the strings returned by strtok (strdup() is one way - just remember to free() each copy...)

How to read a text file into matrix form in C

Given the following text file with the following content in it
SpotA B C
SpotB pass D
Spotc A E F
How to do I break up the words into tokens and store them in a 10 x 10 matrix.
Note that if the content in the file is a matrix size with smaller than 10 x 10, I want to add the character ~ to those positions.
So far this is my code:
char *matrix[10][10];
int loadFileToMatrix(char *filename){
FILE *fp;
int row = 0;
int col= 0;
char *tokens;
char buffer[1000];
fp = fopen(filename,"r");
if(fp == NULL){
perror(filename);
return(1);
}
while((fgets(buffer, sizeof(buffer), fp))!= NULL) {
tokens = strtok(buffer," ");
map[row++][col++] = tokens;
}
return(0);
}
If some can help me figure out how to achieve my goal that would be nice. Currently, I am really confused on how to proceed.
Just use fscanf to read tokens from file to buffer, then copy tokens into your the matrix map. You can use fgetc to detect if it reaches the end of line and the end of file.
char ch;
while (1) {
fscanf(fp, "%s", buffer);
matrix[row][col] = (char *)malloc(sizeof(char) * (strlen(buffer) + 1));
strcpy(matrix[row][col], buffer);
ch = fgetc(fp);
if (ch == ' ') {
col += 1;
}
else if (ch == '\n') {
row += 1;
col = 0;
}
else if (ch == EOF) {
break; // end of file.
}
}
strtok() is a weird function.
The key part of the man page is this:
"On the first call to strtok() the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str should be NULL."
The reason for this is that strtok() alters the string you pass it. It searches through a string until it finds the next character that matches one of the delimiters, and then replaces that delimiter with a null terminator. If the delimiter is found at position n, internally, strtok() saves the position n+1 as the start of the rest of the string.
By calling strtok a second time with a non-null value, you are telling the function to start all over again at the start of that string, and try again to find a delimiter -- which it can never do, because it already found the first one. Instead, your second call to strtok() should pass NULL as the first argument, so each pass can bring out the next token.
If for some reason you need to call strtok() on multiple strings simultaneously, you will overwrite the internally-saved address; only the most recent call is saved properly. The reentrant function strtok_r() is useful in that situation.
If you're ever not sure how to use a function, the man pages are the best resource. You can type man strtok at the command line, or even just google it.
It looks like, in this case, you're using strtok() only once. This will just return the address of the first piece of the buffer, delimited by your delimiters. You need to call strtok() in a loop to get each piece in turn.

Remove the first part of a C String

I'm having a lot of trouble figuring this out. I have a C string, and I want to remove the first part of it. Let's say its: "Food,Amount,Calories". I want to copy out each one of those values, but not the commas. I find the comma, and return the position of the comma to my method. Then I use
strncpy(aLine.field[i], theLine, end);
To copy "theLine" to my array at position "i", with only the first "end" characters (for the first time, "end" would be 4, because that is where the first comma is). But then, because it's in a Loop, I want to remove "Food," from the array, and do the process over again. However, I cannot see how I can remove the first part (or move the array pointer forward?) and keep the rest of it. Any help would be useful!
What you need is to chop off strings with comma as your delimiter.
You need strtok to do this. Here's an example code for you:
int main (int argc, const char * argv[]) {
char *s = "asdf,1234,qwer";
char str[15];
strcpy(str, s);
printf("\nstr: %s", str);
char *tok = strtok(str, ",");
printf("\ntok: %s", tok);
tok = strtok(NULL, ",");
printf("\ntok: %s", tok);
tok = strtok(NULL, ",");
printf("\ntok: %s", tok);
return 0;
}
This will give you the following output:
str: asdf,1234,qwer
tok: asdf
tok: 1234
tok: qwer
If you have to keep the original string, then strtok. If not, you can replace each separator with '\0', and use the obtained strings directly:
char s_RO[] = "abc,123,xxxx", *s = s_RO;
while (s){
char* old_str = s;
s = strchr(s, ',');
if (s){
*s = '\0';
s++;
};
printf("found string %s\n", old_str);
};
The function you might want to use is strtok()
Here is a nice example - http://www.cplusplus.com/reference/clibrary/cstring/strtok/
Personally, I would use strtok().
I would not recommend removing extracted tokens from the string. Removing part of a string requires copying the remaining characters, which is not very efficient.
Instead, you should keep track of your positions and just copy the sections you want to the new string.
But, again, I would use strtok().
if you know where the comma is, you can just keep reading the string from that point on.
for example
void readTheString(const char *theLine)
{
const char *wordStart = theLine;
const char *wordEnd = theLine;
int i = 0;
while (*wordStart) // while we haven't reached the null termination character
{
while (*wordEnd != ',')
wordEnd++;
// ... copy the substring ranging from wordStart to wordEnd
wordStart = ++wordEnd; // start the next word
}
}
or something like that.
the null termination check is probably wrong, unless the string also ends with a ','... but you get the idea.
anyway, using strtok would probably be a better idea.

Resources