Splitting strings with scanf in C - c

I have a string that has # and ! symbols throughout it. My task is to separate all the content in between those symbols into new strings. I don't know how many symbols there are going to be.
Can I use the scanf function to separate them into strings
I was thinking something like this:
input string: dawddwamars#dawdjiawjd!fjejafi!djiajoa#jdawijd#
char s1[20], s2[20], s3[20], s4[20], s5[20];
scanf("%s[^!][^#]%s[^!][^#]%s[^!][^#]%s[^!][^#]%s[^!][^#]", s1, s2, s3, s4, s5);
Would this work? Or does someone have a method that is better.
I need to separate the string into substrings because i have to search for the longest common substring in those new strings.

To get you started, here is some non-flexible code concerning number of substrings you will get out of the input string (strings array). As already mentioned, use strsep() (since strtok() is obsoleted by it, see man strtok).
#include <stdio.h>
#include <string.h>
int main() {
char* str;
char* token;
char* temp;
char strings[10][20];
int i = 0;
str = strdup("dawddwamars#dawdjiawjd!fjejafi!djiajoa#jdawijd#");
printf("%s\n", str);
while ((token = strsep(&str, "#")) != NULL) {
temp = strdup(token);
while ((token = strsep(&temp, "!")) != NULL) {
printf("%s\n", token);
strcpy(strings[i], token);
i++;
}
}
}

If you must use scanf()
#define Fmt1 "%19[^#!]"
#define Sep1 "%*[#!]"
char s1[20], s2[20], s3[20], s4[20], s5[20];
int count = scanf(" " Fmt1 Sep1 Fmt1 Sep1 Fmt1 Sep1 Fmt1 Sep1 Fmt1,
s1, s2, s3, s4, s5);
// count represents the number of successfully scanned strings.
// Expected range 0 to 5 and EOF.

scanf() has a lot of features that just aren't needed here.
To be more efficient, I would probably use strtok(), which seems ideal for this task.
Alternatively, I might just write C code to find the next # or ! using strchr() or a simple loop, and then just extract the tokens myself.

Related

C - Segmentation fault using strtok

I have this code where its read multiple files and print a certain value. After reading files, at a certain moment my while loop stop and show a segmentation fault ...
Here is my code
int main () {
const char s[2] = ",";
const char s2[2] = ":";
char var1[] = "fiftyTwoWeekHigh\"";
char *fiftyhigh;
char *fiftyhigh2;
char *fiftyhigh_token;
char *fiftyhigh2_token;
char var2[] = "fiftyTwoWeekLow\"";
char *fiftylow;
char *fiftylow2;
char *fiftylow_token;
char *fiftylow2_token;
char var3[] = "regularMarketPrice\"";
char *price;
char *price2;
char *price_token;
char *price2_token;
FILE *fp;
char* data = "./data/";
char* json = ".json";
char line[MAX_LINES];
char line2[MAX_LINES];
int len;
char* fichier = "./data/indices.txt";
fp = fopen(fichier, "r");
if (fp == NULL){
printf("Impossible d'ouvrir le fichier %s", fichier);
return 1;
}
while (fgets(line, sizeof(line), fp) != NULL) {
char fname[10000];
len = strlen(line);
if (line[len-1] == '\n') {
line[len-1] = 0;
}
int ret = snprintf(fname, sizeof(fname), "%s%s%s", data, line, json);
if (ret < 0) {
abort();
}
printf("%s\n", fname);
FILE* f = fopen(fname, "r");
while ( fgets( line2, MAX_LINES, f ) != NULL ) {
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
price = strstr(line2, var3);
price_token = strtok(price, s);
price2 = strstr(price_token, s2);
price2_token = strtok(price2, s2);
printf("%s\n", price2_token);
//printf("\n%s\t%s\t%s\t%s\t%s", line, calculcx(fiftyhigh2_token, price2_token, fiftylow2_token), "DIV-1", price2_token, "test");
}
fclose(f);
}
fclose(fp);
return 0;
}
and the output is :
./data/k.json
13.59
5.31
8.7
./data/BCE.json
60.14
46.03
56.74
./data/BNS.json
80.16
46.38
78.73
./data/BLU.json
16.68
2.7
Segmentation fault
It is like my program stop because it can't reach a certain data at a certain file... Is there a way to allocate more memory ? Because my MAX_LINES is already set at 6000.
I'm assuming that the lines in your file look something like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }
In other words it's some kind of JSON format. I'm assuming that the line starts with '{' so each line is a JSON object.
You read that line into line2, which now contains:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }\0
Note the \0 at the end that terminates the string. Note also that "fiftyTwoWeekLow" comes first, which turns out to be really important.
Now let's trace through the code here:
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
First you call strstr to find the position of "fiftyTwoWeekHigh". This will return a pointer to the position of that field name in the line. Then you call strtok to find the comma that separates this value from the next. I think that this is where things start to go wrong. After the call to strtok, line2 looks like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100\0 ... }\0
Note that strtok has modified the string: the comma has been replaced with \0. That's so you can use the returned pointer fiftyhigh_token as a string without seeing all the stuff that came after the comma.
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
Next you look for the colon and then call strtok with a pointer to the colon. Since the delimiter you're passing to strok is the colon, strtok ignores the colon and returns the next token, which (because the string we're looking at, which ends after "100," has no more colons) is the rest of the string, in other words, the number.
So you've gotten your number, but probably not in the way you expected? There was really no point in the second call to strtok since (assuming the JSON was well-formed) the position of "100" was just fiftyhigh2+1.
Now we try to find "fiftyTwoWeekLow:"
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
This is basically the same process, and after you call strtok, line2 like this:
{"fiftyTwoWeekLow":32\0"fiftyTwoWeekHigh":100\0 ... }\0
Note that you're only able to find "fiftyTwoWeekLow" because it comes before "fiftyTwoWeekHigh" in the line. If it had come after, then you'd have been unable to find it due to the \0 added after "fiftyTwoWeekHigh" earlier. In that case, strstr would have returned NULL, which would cause strtok to return NULL, and then you'd definitely have gotten a seg fault after passing NULL to strstr.
So the code is really sensitive to the order in which the fields appear in the line, and it's probably failing because some of your lines have the fields in a different order. Or maybe some fields are just missing from some lines, which would have the same effect.
If you're parsing JSON, you should really use a library designed for that purpose. But if you really want to use strtok then you should:
Read line2.
Call strtok(line2, ",") once, then repeatedly call strtok(NULL, ",") in a loop until it returns null. This will break up the line into tokens that each look like "someField":100.
Isolate the field name and value from each of these tokens (just call strchr(token, ':') to find the value). Do not call strtok here, because it will change the internal state of strtok and you won't be able to use strtok(NULL, ",") to continue processing the line.
Test the field name, and depending on its value, set an appropriate variable. In other words, if it's the "fiftyTwoWeekLow" field, set a variable called fiftyTwoWeekLow. You don't have to bother to strip off the quotes, just include them in the string you're comparing with.
Once you've processed all the tokens (strtok returns NULL), do something with the variables you set.
You may be to pass ",{}" as the delimiter to strtok in order to get rid of any open and close curly braces that surround the line. Or you could look for them in each token and ignore them if they appear.
You could also pass "\"{},:" as the delimiter to strtok. This would cause strtok to emit an alternating sequence of field names and values. You could call strtok once to get the field name, again to get the value, then test the field name and do something with the value.
Using strtok is a pretty primitive way of parsing JSON, but it will will work as long as your JSON only contains simple field names and numbers and doesn't include any strings that themselves contain delimiter characters.
Did you mean '\0' ?
if (line[len-1] == '\n') {
line[len-1] = 0;
}
I advise you to use gdb to see where the segfault occurs and why.
I don't think you have to allocate much more memory. But the segfault may happens because you don't have anymore data and you still print the result.
Use if(price2_token!=NULL) printf("%s\n", price2_token); for example.

How to compare strings in C without hard coding the increments?

I have a buffer that holds a string from a CSV file that I opened and read. I split the string up by using strtok() and split on the " , ". So now my string looks like this:
char buff[BUFFER_SIZE] = "1000" "CAP_SETPCAP" "CAP_NET_RAW"
I want to make comparisons now for each section of the string, but for the life of me I cannot get it to work. I want to be able to do it without hard coding anything meaning I don't want to assume how many spaces I need to move over. For example to start at CAP_SETPCAP I don't want to have to put buff+5. Anybody know a better way to handle this?
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define BUFFER_SIZE 1024
int main(int argc, char *argv[]) {
FILE *fp = fopen("csvtest.csv", "r");
char buff[BUFFER_SIZE];
fgets(buff, 1024, fp);
char *csvData = strtok(buff, ",");
while(csvData != NULL){
csvData = strtok(NULL, ",");
}
int i;
while(buff[i] != '\0'){
strcmp(buff, "CAP_NET_RAW")
printf("Match found");
i++;
}
//or I wanted to do string comparison, but I kept getting
//segmentation fault (core dumped)
char *found;
found = strstr(buff, "CAP_NET_RAW");
printf("%s\n", found);
fclose(fp);
return 0;
}
Your code has three different sections. Lets analyze them:
1. The strtok section
You get the data from the file and then you iterate on strtok:
fgets(buff, 1024, fp);
char *csvData = strtok(buff, ",");
while(csvData != NULL){
csvData = strtok(NULL, ",");
}
You seem not interested in what you found in the different positions: in fact csvData is always overwritten with the last token. And at last it is equal to NULL.
The only thing you get is having the commas in the original array buff overwritten with '\0'. Printing buff you will only see "1000", because after this substring there is the string terminator placed by strtok.
2. Searching "CAP_NET_RAW"
You now iterate on buff[i] until the string terminator. But the string terminator is after the first substring "1000"!
int i;
while(buff[i] != '\0'){
strcmp(buff, "CAP_NET_RAW")
printf("Match found");
i++;
}
Furthermore you search for CAP_NET_RAW, but even without the inner-terminators-issue, the comparison would never succeed. That's because (1) the string actually present in buff is "CAP_NET_RAW" (with double quotes); (2) that token is the last of the row, an it sitll will have the trailing '\n' (fgets doesn't remove it).
By the way: I copied the code after your edit, and now there's no check on strcmp() return value. I suppose it is a typo. Note: strcmp returns 0 if the string match.
3. The strstr attempt
Finally you look for the string using the strstr function. That's a clever idea. But as already said before, buff doesn't contain it. Well, the buffer actually do contain it, but string utilities will stop at the first '\0' they found.
char *found;
found = strstr(buff, "CAP_NET_RAW");
printf("%s\n", found);
So found will be NULL, and dereferencing a NULL pointer (that's what %s tells printf to do) will lead to a segmentation fault.
4. Conclusions
As a very simple way to find the only string you care of, I suggest to use only strstr, without using strtok before. Alternatively you can still use strtok, but saving tokens in different strings so that you can access them later.

String manipulation using strtok/ sscanf in C

I'm trying to separate the following string into three separate variables, i.e., a, b and c.:
" mov/1/1/1,0 STR{7}, r7"
each need to hold a different segment of the string, e.g:
a = "mov/1/1/1,0"
b = "STR{7}"
c = "r7"
There may be a space or also a tab between each command; this what makes this code part trickier.
I tried to use strtok, for the string manipulation, but it didn't work out.
char command[50] = " mov/1/1/1,0 STR{7}, r7";
char a[10], b[10], c[10];
char * ptr = strtok(command, "\t");
strcpy(a, ptr);
ptr = strtok(NULL, "\t");
strcpy(b, ptr);
ptr = strtok(NULL, ", ");
strcpy(c, ptr);
but this gets things really messy as the variables a, b and c get to hold more values than they should, which leads the program to crash.
Input may vary from:
" mov/1/1/1,0 STR{7}, r7"
"jsr /0,0 PRTSTR"
"mov/1/1/0,0 STRADD{5}, LASTCHAR {r3} "
in which the values of a,b and c change to different part of the given string.
I was told it is safer to use sscanf for that kind of manners than strtok, but I'm not sure why and how it could assist me.
I would be more than glad to hear your opinion!
This should do the trick :
sscanf(command, "%s,%s,%s", &a, &b, &c)
From scanf manpage, %s eats whitespaces, be them spaces or tabs :
s : Matches a sequence of non-white-space characters; the next pointer
must be a pointer to character array that is long enough to hold the
input sequence and the terminating null byte ('\0'), which is added
automatically. The input string stops at white space or at the
maximum field width, whichever occurs first.
As you might be knowing that you can use sscanf() the same way as scanf(), the difference is sscanf scans from string, while scanf from standard input.
In this problem you can specify scanf, with a set of characters to "always skip", as done in this link.
Since you have different set of constraints for scanning all the three strings, you can specify, using %*[^...], these constraints, before every %s inside sscanf().
I have reservations about using strtok(), but this code using it seems to do what you need. As I noted in a comment, the sample string "jsr /0,0 PRTSTR" throws a spanner in the works; it has a significant comma in the second field, whereas in the other two example strings, the comma in the second field is not significant. If you need to remove trailing commas, you can do that after the space-based splitting — as shown in this code. The second loop tests the zap_trailing_commas() function to ensure that it behaves under degenerate cases, zapping trailing commas but not underflowing the start of the buffer or anything horrid.
#include <stdio.h>
#include <string.h>
static void zap_trailing_commas(char *str)
{
size_t len = strlen(str);
while (len-- > 0 && str[len] == ',')
str[len] = '\0';
}
static void splitter(char *command)
{
char a[20], b[20], c[20];
char *ptr = strtok(command, " \t");
strcpy(a, ptr);
zap_trailing_commas(a);
ptr = strtok(NULL, " \t");
strcpy(b, ptr);
zap_trailing_commas(b);
ptr = strtok(NULL, " \t");
strcpy(c, ptr);
zap_trailing_commas(c);
printf("<<%s>> <<%s>> <<%s>>\n", a, b, c);
}
int main(void)
{
char data[][50] =
{
" mov/1/1/1,0 STR{7}, r7",
"jsr /0,0 PRTSTR",
"mov/1/1/0,0 STRADD{5}, LASTCHAR {r3} ",
};
for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); i++)
splitter(data[i]);
char commas[][10] = { "X,,,", "X,,", "X,", "X" };
for (size_t i = 0; i < sizeof(commas)/sizeof(commas[0]); i++)
{
printf("<<%s>> ", commas[i]);
zap_trailing_commas(&commas[i][1]);
printf("<<%s>>\n", commas[i]);
}
return 0;
}
Sample output:
<<mov/1/1/1,0>> <<STR{7}>> <<r7>>
<<jsr>> <</0,0>> <<PRTSTR>>
<<mov/1/1/0,0>> <<STRADD{5}>> <<LASTCHAR>>
<<X,,,>> <<X>>
<<X,,>> <<X>>
<<X,>> <<X>>
<<X>> <<X>>
I also tested a variant with commas in place of the X's and that left the single comma alone.

Remove the first part of a C String

I'm having a lot of trouble figuring this out. I have a C string, and I want to remove the first part of it. Let's say its: "Food,Amount,Calories". I want to copy out each one of those values, but not the commas. I find the comma, and return the position of the comma to my method. Then I use
strncpy(aLine.field[i], theLine, end);
To copy "theLine" to my array at position "i", with only the first "end" characters (for the first time, "end" would be 4, because that is where the first comma is). But then, because it's in a Loop, I want to remove "Food," from the array, and do the process over again. However, I cannot see how I can remove the first part (or move the array pointer forward?) and keep the rest of it. Any help would be useful!
What you need is to chop off strings with comma as your delimiter.
You need strtok to do this. Here's an example code for you:
int main (int argc, const char * argv[]) {
char *s = "asdf,1234,qwer";
char str[15];
strcpy(str, s);
printf("\nstr: %s", str);
char *tok = strtok(str, ",");
printf("\ntok: %s", tok);
tok = strtok(NULL, ",");
printf("\ntok: %s", tok);
tok = strtok(NULL, ",");
printf("\ntok: %s", tok);
return 0;
}
This will give you the following output:
str: asdf,1234,qwer
tok: asdf
tok: 1234
tok: qwer
If you have to keep the original string, then strtok. If not, you can replace each separator with '\0', and use the obtained strings directly:
char s_RO[] = "abc,123,xxxx", *s = s_RO;
while (s){
char* old_str = s;
s = strchr(s, ',');
if (s){
*s = '\0';
s++;
};
printf("found string %s\n", old_str);
};
The function you might want to use is strtok()
Here is a nice example - http://www.cplusplus.com/reference/clibrary/cstring/strtok/
Personally, I would use strtok().
I would not recommend removing extracted tokens from the string. Removing part of a string requires copying the remaining characters, which is not very efficient.
Instead, you should keep track of your positions and just copy the sections you want to the new string.
But, again, I would use strtok().
if you know where the comma is, you can just keep reading the string from that point on.
for example
void readTheString(const char *theLine)
{
const char *wordStart = theLine;
const char *wordEnd = theLine;
int i = 0;
while (*wordStart) // while we haven't reached the null termination character
{
while (*wordEnd != ',')
wordEnd++;
// ... copy the substring ranging from wordStart to wordEnd
wordStart = ++wordEnd; // start the next word
}
}
or something like that.
the null termination check is probably wrong, unless the string also ends with a ','... but you get the idea.
anyway, using strtok would probably be a better idea.

Tips on how to read last 'word' in a character array in C

Just looking to be pointed in the right direction:
Have standard input to a C program, I've taken each line in at a time and storing in a char[].
Now that I have the char[], how do I take the last word (just assuming separated by a space) and then convert to lowercase?
I've tried this but it just hangs the program:
while (sscanf(line, "%s", word) == 1)
printf("%s\n", word);
Taken what was suggested and came up with this, is there a more efficient way of doing this?
char* last = strrchr(line, ' ')+1;
while (*last != '\0'){
*last = tolower(*last);
putchar((int)*last);
last++;
}
If I had to do this, I'd probably start with strrchr. That should get you the beginning of the last word. From there it's a simple matter of walking through characters and converting to lower case. Oh, there is the minor detail that you'd have to delete any trailing space characters first.
The issue with your code is that it will repeatedly read the first word of the sentence into word. It will not move to the next word each time you call it. So if you have this as your code:
char * line = "this is a line of text";
Then every single time sscanf is called, it will load "this" into word. And since it read 1 word each time, sscanf will always return 1.
This will help:
char dest[10], source [] = "blah blah blah!" ;
int sum = 0 , index =0 ;
while(sscanf(source+(sum+=index),"%s%n",dest,&index)!=-1);
printf("%s\n",dest) ;
'strtok' will split the input string based on certain delimitors, in your case the delimitor would be a space, thus it will return an array of "words" and you would simply take the last one.
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
One could illustrate many different methods of performing this operation and then determine which one contained the best performance and useability characteristics, or the advantages and disadvantages of each, I simply wanted to illustrate what I mentioned above with a code snippet.
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
int main()
{
char line[] = "This is a sentence with a last WoRd ";
char *lastWord = NULL;
char *token = strtok(line, " ");
while (token != NULL)
{
lastWord = token;
token = strtok(NULL, " ");
}
while (*lastWord)
{
printf("%c", tolower(*lastWord++));
}
_getch();
}

Resources