Remove the first part of a C String - c

I'm having a lot of trouble figuring this out. I have a C string, and I want to remove the first part of it. Let's say its: "Food,Amount,Calories". I want to copy out each one of those values, but not the commas. I find the comma, and return the position of the comma to my method. Then I use
strncpy(aLine.field[i], theLine, end);
To copy "theLine" to my array at position "i", with only the first "end" characters (for the first time, "end" would be 4, because that is where the first comma is). But then, because it's in a Loop, I want to remove "Food," from the array, and do the process over again. However, I cannot see how I can remove the first part (or move the array pointer forward?) and keep the rest of it. Any help would be useful!

What you need is to chop off strings with comma as your delimiter.
You need strtok to do this. Here's an example code for you:
int main (int argc, const char * argv[]) {
char *s = "asdf,1234,qwer";
char str[15];
strcpy(str, s);
printf("\nstr: %s", str);
char *tok = strtok(str, ",");
printf("\ntok: %s", tok);
tok = strtok(NULL, ",");
printf("\ntok: %s", tok);
tok = strtok(NULL, ",");
printf("\ntok: %s", tok);
return 0;
}
This will give you the following output:
str: asdf,1234,qwer
tok: asdf
tok: 1234
tok: qwer

If you have to keep the original string, then strtok. If not, you can replace each separator with '\0', and use the obtained strings directly:
char s_RO[] = "abc,123,xxxx", *s = s_RO;
while (s){
char* old_str = s;
s = strchr(s, ',');
if (s){
*s = '\0';
s++;
};
printf("found string %s\n", old_str);
};

The function you might want to use is strtok()
Here is a nice example - http://www.cplusplus.com/reference/clibrary/cstring/strtok/

Personally, I would use strtok().
I would not recommend removing extracted tokens from the string. Removing part of a string requires copying the remaining characters, which is not very efficient.
Instead, you should keep track of your positions and just copy the sections you want to the new string.
But, again, I would use strtok().

if you know where the comma is, you can just keep reading the string from that point on.
for example
void readTheString(const char *theLine)
{
const char *wordStart = theLine;
const char *wordEnd = theLine;
int i = 0;
while (*wordStart) // while we haven't reached the null termination character
{
while (*wordEnd != ',')
wordEnd++;
// ... copy the substring ranging from wordStart to wordEnd
wordStart = ++wordEnd; // start the next word
}
}
or something like that.
the null termination check is probably wrong, unless the string also ends with a ','... but you get the idea.
anyway, using strtok would probably be a better idea.

Related

C - Segmentation fault using strtok

I have this code where its read multiple files and print a certain value. After reading files, at a certain moment my while loop stop and show a segmentation fault ...
Here is my code
int main () {
const char s[2] = ",";
const char s2[2] = ":";
char var1[] = "fiftyTwoWeekHigh\"";
char *fiftyhigh;
char *fiftyhigh2;
char *fiftyhigh_token;
char *fiftyhigh2_token;
char var2[] = "fiftyTwoWeekLow\"";
char *fiftylow;
char *fiftylow2;
char *fiftylow_token;
char *fiftylow2_token;
char var3[] = "regularMarketPrice\"";
char *price;
char *price2;
char *price_token;
char *price2_token;
FILE *fp;
char* data = "./data/";
char* json = ".json";
char line[MAX_LINES];
char line2[MAX_LINES];
int len;
char* fichier = "./data/indices.txt";
fp = fopen(fichier, "r");
if (fp == NULL){
printf("Impossible d'ouvrir le fichier %s", fichier);
return 1;
}
while (fgets(line, sizeof(line), fp) != NULL) {
char fname[10000];
len = strlen(line);
if (line[len-1] == '\n') {
line[len-1] = 0;
}
int ret = snprintf(fname, sizeof(fname), "%s%s%s", data, line, json);
if (ret < 0) {
abort();
}
printf("%s\n", fname);
FILE* f = fopen(fname, "r");
while ( fgets( line2, MAX_LINES, f ) != NULL ) {
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
price = strstr(line2, var3);
price_token = strtok(price, s);
price2 = strstr(price_token, s2);
price2_token = strtok(price2, s2);
printf("%s\n", price2_token);
//printf("\n%s\t%s\t%s\t%s\t%s", line, calculcx(fiftyhigh2_token, price2_token, fiftylow2_token), "DIV-1", price2_token, "test");
}
fclose(f);
}
fclose(fp);
return 0;
}
and the output is :
./data/k.json
13.59
5.31
8.7
./data/BCE.json
60.14
46.03
56.74
./data/BNS.json
80.16
46.38
78.73
./data/BLU.json
16.68
2.7
Segmentation fault
It is like my program stop because it can't reach a certain data at a certain file... Is there a way to allocate more memory ? Because my MAX_LINES is already set at 6000.
I'm assuming that the lines in your file look something like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }
In other words it's some kind of JSON format. I'm assuming that the line starts with '{' so each line is a JSON object.
You read that line into line2, which now contains:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100, ... }\0
Note the \0 at the end that terminates the string. Note also that "fiftyTwoWeekLow" comes first, which turns out to be really important.
Now let's trace through the code here:
fiftyhigh = strstr(line2, var1);
fiftyhigh_token = strtok(fiftyhigh, s);
First you call strstr to find the position of "fiftyTwoWeekHigh". This will return a pointer to the position of that field name in the line. Then you call strtok to find the comma that separates this value from the next. I think that this is where things start to go wrong. After the call to strtok, line2 looks like this:
{"fiftyTwoWeekLow":32,"fiftyTwoWeekHigh":100\0 ... }\0
Note that strtok has modified the string: the comma has been replaced with \0. That's so you can use the returned pointer fiftyhigh_token as a string without seeing all the stuff that came after the comma.
fiftyhigh2 = strstr(fiftyhigh_token, s2);
fiftyhigh2_token = strtok(fiftyhigh2, s2);
printf("%s\n", fiftyhigh2_token);
Next you look for the colon and then call strtok with a pointer to the colon. Since the delimiter you're passing to strok is the colon, strtok ignores the colon and returns the next token, which (because the string we're looking at, which ends after "100," has no more colons) is the rest of the string, in other words, the number.
So you've gotten your number, but probably not in the way you expected? There was really no point in the second call to strtok since (assuming the JSON was well-formed) the position of "100" was just fiftyhigh2+1.
Now we try to find "fiftyTwoWeekLow:"
fiftylow = strstr(line2, var2);
fiftylow_token = strtok(fiftylow, s);
fiftylow2 = strstr(fiftylow_token, s2);
fiftylow2_token = strtok(fiftylow2, s2);
printf("%s\n", fiftylow2_token);
This is basically the same process, and after you call strtok, line2 like this:
{"fiftyTwoWeekLow":32\0"fiftyTwoWeekHigh":100\0 ... }\0
Note that you're only able to find "fiftyTwoWeekLow" because it comes before "fiftyTwoWeekHigh" in the line. If it had come after, then you'd have been unable to find it due to the \0 added after "fiftyTwoWeekHigh" earlier. In that case, strstr would have returned NULL, which would cause strtok to return NULL, and then you'd definitely have gotten a seg fault after passing NULL to strstr.
So the code is really sensitive to the order in which the fields appear in the line, and it's probably failing because some of your lines have the fields in a different order. Or maybe some fields are just missing from some lines, which would have the same effect.
If you're parsing JSON, you should really use a library designed for that purpose. But if you really want to use strtok then you should:
Read line2.
Call strtok(line2, ",") once, then repeatedly call strtok(NULL, ",") in a loop until it returns null. This will break up the line into tokens that each look like "someField":100.
Isolate the field name and value from each of these tokens (just call strchr(token, ':') to find the value). Do not call strtok here, because it will change the internal state of strtok and you won't be able to use strtok(NULL, ",") to continue processing the line.
Test the field name, and depending on its value, set an appropriate variable. In other words, if it's the "fiftyTwoWeekLow" field, set a variable called fiftyTwoWeekLow. You don't have to bother to strip off the quotes, just include them in the string you're comparing with.
Once you've processed all the tokens (strtok returns NULL), do something with the variables you set.
You may be to pass ",{}" as the delimiter to strtok in order to get rid of any open and close curly braces that surround the line. Or you could look for them in each token and ignore them if they appear.
You could also pass "\"{},:" as the delimiter to strtok. This would cause strtok to emit an alternating sequence of field names and values. You could call strtok once to get the field name, again to get the value, then test the field name and do something with the value.
Using strtok is a pretty primitive way of parsing JSON, but it will will work as long as your JSON only contains simple field names and numbers and doesn't include any strings that themselves contain delimiter characters.
Did you mean '\0' ?
if (line[len-1] == '\n') {
line[len-1] = 0;
}
I advise you to use gdb to see where the segfault occurs and why.
I don't think you have to allocate much more memory. But the segfault may happens because you don't have anymore data and you still print the result.
Use if(price2_token!=NULL) printf("%s\n", price2_token); for example.

How to compare strings in C without hard coding the increments?

I have a buffer that holds a string from a CSV file that I opened and read. I split the string up by using strtok() and split on the " , ". So now my string looks like this:
char buff[BUFFER_SIZE] = "1000" "CAP_SETPCAP" "CAP_NET_RAW"
I want to make comparisons now for each section of the string, but for the life of me I cannot get it to work. I want to be able to do it without hard coding anything meaning I don't want to assume how many spaces I need to move over. For example to start at CAP_SETPCAP I don't want to have to put buff+5. Anybody know a better way to handle this?
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define BUFFER_SIZE 1024
int main(int argc, char *argv[]) {
FILE *fp = fopen("csvtest.csv", "r");
char buff[BUFFER_SIZE];
fgets(buff, 1024, fp);
char *csvData = strtok(buff, ",");
while(csvData != NULL){
csvData = strtok(NULL, ",");
}
int i;
while(buff[i] != '\0'){
strcmp(buff, "CAP_NET_RAW")
printf("Match found");
i++;
}
//or I wanted to do string comparison, but I kept getting
//segmentation fault (core dumped)
char *found;
found = strstr(buff, "CAP_NET_RAW");
printf("%s\n", found);
fclose(fp);
return 0;
}
Your code has three different sections. Lets analyze them:
1. The strtok section
You get the data from the file and then you iterate on strtok:
fgets(buff, 1024, fp);
char *csvData = strtok(buff, ",");
while(csvData != NULL){
csvData = strtok(NULL, ",");
}
You seem not interested in what you found in the different positions: in fact csvData is always overwritten with the last token. And at last it is equal to NULL.
The only thing you get is having the commas in the original array buff overwritten with '\0'. Printing buff you will only see "1000", because after this substring there is the string terminator placed by strtok.
2. Searching "CAP_NET_RAW"
You now iterate on buff[i] until the string terminator. But the string terminator is after the first substring "1000"!
int i;
while(buff[i] != '\0'){
strcmp(buff, "CAP_NET_RAW")
printf("Match found");
i++;
}
Furthermore you search for CAP_NET_RAW, but even without the inner-terminators-issue, the comparison would never succeed. That's because (1) the string actually present in buff is "CAP_NET_RAW" (with double quotes); (2) that token is the last of the row, an it sitll will have the trailing '\n' (fgets doesn't remove it).
By the way: I copied the code after your edit, and now there's no check on strcmp() return value. I suppose it is a typo. Note: strcmp returns 0 if the string match.
3. The strstr attempt
Finally you look for the string using the strstr function. That's a clever idea. But as already said before, buff doesn't contain it. Well, the buffer actually do contain it, but string utilities will stop at the first '\0' they found.
char *found;
found = strstr(buff, "CAP_NET_RAW");
printf("%s\n", found);
So found will be NULL, and dereferencing a NULL pointer (that's what %s tells printf to do) will lead to a segmentation fault.
4. Conclusions
As a very simple way to find the only string you care of, I suggest to use only strstr, without using strtok before. Alternatively you can still use strtok, but saving tokens in different strings so that you can access them later.

String manipulation using strtok/ sscanf in C

I'm trying to separate the following string into three separate variables, i.e., a, b and c.:
" mov/1/1/1,0 STR{7}, r7"
each need to hold a different segment of the string, e.g:
a = "mov/1/1/1,0"
b = "STR{7}"
c = "r7"
There may be a space or also a tab between each command; this what makes this code part trickier.
I tried to use strtok, for the string manipulation, but it didn't work out.
char command[50] = " mov/1/1/1,0 STR{7}, r7";
char a[10], b[10], c[10];
char * ptr = strtok(command, "\t");
strcpy(a, ptr);
ptr = strtok(NULL, "\t");
strcpy(b, ptr);
ptr = strtok(NULL, ", ");
strcpy(c, ptr);
but this gets things really messy as the variables a, b and c get to hold more values than they should, which leads the program to crash.
Input may vary from:
" mov/1/1/1,0 STR{7}, r7"
"jsr /0,0 PRTSTR"
"mov/1/1/0,0 STRADD{5}, LASTCHAR {r3} "
in which the values of a,b and c change to different part of the given string.
I was told it is safer to use sscanf for that kind of manners than strtok, but I'm not sure why and how it could assist me.
I would be more than glad to hear your opinion!
This should do the trick :
sscanf(command, "%s,%s,%s", &a, &b, &c)
From scanf manpage, %s eats whitespaces, be them spaces or tabs :
s : Matches a sequence of non-white-space characters; the next pointer
must be a pointer to character array that is long enough to hold the
input sequence and the terminating null byte ('\0'), which is added
automatically. The input string stops at white space or at the
maximum field width, whichever occurs first.
As you might be knowing that you can use sscanf() the same way as scanf(), the difference is sscanf scans from string, while scanf from standard input.
In this problem you can specify scanf, with a set of characters to "always skip", as done in this link.
Since you have different set of constraints for scanning all the three strings, you can specify, using %*[^...], these constraints, before every %s inside sscanf().
I have reservations about using strtok(), but this code using it seems to do what you need. As I noted in a comment, the sample string "jsr /0,0 PRTSTR" throws a spanner in the works; it has a significant comma in the second field, whereas in the other two example strings, the comma in the second field is not significant. If you need to remove trailing commas, you can do that after the space-based splitting — as shown in this code. The second loop tests the zap_trailing_commas() function to ensure that it behaves under degenerate cases, zapping trailing commas but not underflowing the start of the buffer or anything horrid.
#include <stdio.h>
#include <string.h>
static void zap_trailing_commas(char *str)
{
size_t len = strlen(str);
while (len-- > 0 && str[len] == ',')
str[len] = '\0';
}
static void splitter(char *command)
{
char a[20], b[20], c[20];
char *ptr = strtok(command, " \t");
strcpy(a, ptr);
zap_trailing_commas(a);
ptr = strtok(NULL, " \t");
strcpy(b, ptr);
zap_trailing_commas(b);
ptr = strtok(NULL, " \t");
strcpy(c, ptr);
zap_trailing_commas(c);
printf("<<%s>> <<%s>> <<%s>>\n", a, b, c);
}
int main(void)
{
char data[][50] =
{
" mov/1/1/1,0 STR{7}, r7",
"jsr /0,0 PRTSTR",
"mov/1/1/0,0 STRADD{5}, LASTCHAR {r3} ",
};
for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); i++)
splitter(data[i]);
char commas[][10] = { "X,,,", "X,,", "X,", "X" };
for (size_t i = 0; i < sizeof(commas)/sizeof(commas[0]); i++)
{
printf("<<%s>> ", commas[i]);
zap_trailing_commas(&commas[i][1]);
printf("<<%s>>\n", commas[i]);
}
return 0;
}
Sample output:
<<mov/1/1/1,0>> <<STR{7}>> <<r7>>
<<jsr>> <</0,0>> <<PRTSTR>>
<<mov/1/1/0,0>> <<STRADD{5}>> <<LASTCHAR>>
<<X,,,>> <<X>>
<<X,,>> <<X>>
<<X,>> <<X>>
<<X>> <<X>>
I also tested a variant with commas in place of the X's and that left the single comma alone.

Cannot concatenate strtok's output variable. strcat and strtok

I’ve spent hours on this program and have put several hours online searching for alternatives to my methods and have been plagued with crashes and errors all evening…
I have a few things I'd like to achieve with this code. First I’ll explain my problems, then I’ll post the code and finally I’ll explain my need for the program.
The program outputs just the single words and the concatenate function does nothing. This seems like it should be simple enough to fix...
My first problem is that I cannot seem to get the concatenate function to work, I used the generic strcat function which didn't work and neither did another version I found on the internet ( that function is used here, it is called "mystrcat" ). I want to have the program read in a string and remove "delimiters" to create a single string comprised of every word within the original string. I am trying to do with strtok and a strcat function. If there is an easier or simpler way PLEASE I’m all ears.
Another problem, which isn’t necessarily a problem but an ugly mess: the seven lines following main. I’d prefer to initialize my variables as follows: char variable[amt]; but the code I found for strtok was using a pointer and the code for the strcat function was using pointers. A better understanding of pointers && addresses for strings would probably help me out long term. However I would like to get rid of some of those lines by any means necessary. I can’t have 6 lines dedicated to only 2 variables. When I have 10 variables I do not want 30 lines up top…
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *mystrcat(char *output, char *firstptr);
int main() {
char str[] = "now # is the time for all # good men to come to the # aid of their country";
char delims[] = "# ";
char resultOrig[70]; //was [20];
char firstOrig[20];
//char *result = NULL, *first = NULL;
char result = resultOrig; //was result = &resultOrig;
char first = firstOrig; //was first = &firstOrig;
first = strtok( str, delims );
while( first != NULL ) {
mystrcat(resultOrig, firstOrig);
printf( "%s ", first );
printf("\n %s this should be the concat\'d string so far\n", resultOrig);
first = strtok( NULL, delims );
}
system("pause");
return 0;
}
char *mystrcat(char *resultptr, char *firstptr)
{
char *output = resultptr;
while (*output != '\0')
output++;
while(*firstptr != '\0')
{
*output = *firstptr;
output++;
firstptr++;
}
*output = '\0';
return output;
}
This is just a test program right now but I was intending to use this for a list/database of files. My files have underscores, hyphens, periods, parentheses’, and numbers; all of which I would like to set as the “delimiters”. I was planning on going thru a loop, where I would delete a delimiter(each loop-thru change from _ to – to . etc…) and create a single string, I may want to replace the delimiters with a space or a period. And some files have spaces in them already along with the special characters I’d like to “delimit”.
I’m planning to do all this by means of scanning a text file. Within the file I also have a size in this format: “2,518,6452”. I’m hoping I can sort my database alphabetically or by size, ascending or descending. That’s just some additional information which may be useful to know for my specific questions above.
Below I have included some fictional samples of how these names could appear.
my_file(2009).ext
second.File-group1.extls
the.third.file-vol30.lmth
I am focusing this post on: the question on how to get the concatenate function working or an alternative to strcat and/or strtok. As well as asking for help to unclutter unnecessary or redundant code.
I appreciate all the help and even all those who read through my post.
Thank you so much!
strcat would work if you used first instead of firstOrig in your loop. No need for mystrcat. Can be simplified to:
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "now # is the time for all # good men to come to the # aid of their country";
char delims[] = "# ";
char result[100] = ""; /* Original size was too small */
char* token;
token = strtok(str, delims);
while(token != NULL) {
printf("token = '%s'\n", token);
strcat(result, token);
token = strtok(NULL, delims);
}
printf("%s\n", result);
return 0;
}
Output:
token = 'now'
token = 'is'
token = 'the'
token = 'time'
token = 'for'
token = 'all'
token = 'good'
token = 'men'
token = 'to'
token = 'come'
token = 'to'
token = 'the'
token = 'aid'
token = 'of'
token = 'their'
token = 'country'
nowisthetimeforallgoodmentocometotheaidoftheircountry
There are several problems here:
Missing intitializations fro resultOrig and firstOrig (as codaddict pointed out).
first = &firstOrig doesn't do what you want from it. You later do first = strtok(str, delims), which sets first to point to somewhere in str. It doesn't read data into firstOrig.
You allocate small buffers (just 20 bytes) and try to fill them with much more than this. It would overflow the stack. causing nasty bugs.
You've not initialized the following two strings:
char resultOrig[20];
char firstOrig[20];
and you are appending characters to them. Change them to:
char resultOrig[20] = "";
char firstOrig[20] = "";
Also the name of the character array gives its starting address. So
result = &resultOrig;
first = &firstOrig;
should be:
result = resultOrig;
first = firstOrig;
Change
mystrcat(resultOrig, firstOrig);
to
mystrcat(resultOrig, first);
also make resultOrig to be large enough to hold the concatenations, like:
char resultOrig[100] = "";

String tokenizer in c

the following code will break down the string command using space i.e " " and a full stop i.e. "." What if i want to break down command using the occurrence of both the space and full stop (at the same time) and not each by themselves e.g. a command like: 'hello .how are you' will be broken into the pieces (ignoring the quotes)
[hello]
[how are you today]
char *token2 = strtok(command, " .");
You can do it pretty easily with strstr:
char *strstrtok(char *str, char *delim)
{
static char *prev;
if (!str) str = prev;
if (str) {
char *end = strstr(str, delim);
if (end) {
prev = end + strlen(delim);
*end = 0;
} else {
prev = 0;
}
}
return str;
}
This is pretty much exactly the same as the implementation of strtok, just calling strstr and strlen instead of strcspn and strspn. It also might return empty tokens (if there are two consecutive delimiters or a delimiter at either end); you can arrange to ignore those if you would prefer.
Your best bet might just be to crawl your input with strstr, which finds occurrences of a substring, and manually tokenize on those.
It's a common question you ask, but I've yet to see a particularly elegant solution. The above is straightforward and workable, however.

Resources