Get elements from string - c

I'm stumped in a rather trivial thing...
So, basically I want the "words" between the first one and the last one to go to data and the last one to go to key.
C-POSIX only, pls.
Is strtok_r the way to go or I'm way off on this? Something else?
char *key = NULL, *data=NULL, *save=NULL;
char comando[1024];
fgets(comando, 512, stdin);
strtok_r(comando, " ",&save);
while(strcmp(save,"\n")){
strcat(data,strtok_r(NULL," ",&save));
}
key = strtok_r(NULL, "\n",&save);
P.S: comando is 1024 as memory is not a problem and better safe than sorry. fgets reads 512 'cause that's the char line limit on standard unix terminal.

Your code will crash on this line:
strcat(data,strtok_r(NULL," ",&save));
Because you never reserved space for data. strcat will try to write to a NULL memory address.
Another thing to note is that you shouldn't rely on save to check for the end of the line. According to strtok's manpage:
The saveptr argument is a pointer to a char * variable that is used
internally by strtok_r() in order to maintain context between
successive calls that parse the same string.
Relying on the value of saveptr outside of strtok_r breaks the abstraction layer, you shouldn't assume anything about how strtok uses saveptr. It's bad practice.
A slightly better approach is to keep a pointer to the previous token returned by strtok, and a pointer to the current token. When strtok returns NULL, meaning there are no more tokens, then prev will point to the last token, which is your key. Here's some code:
char *key = NULL, *save=NULL;
char *prev, *curr;
char comando[1024];
char data[1024];
data[0] = '\0';
fgets(comando, 512, stdin);
prev = curr = strtok_r(comando, " ",&save);
while (curr != NULL) {
prev = curr;
curr = strtok_r(NULL, " ", &save);
if (curr != NULL)
strcat(data, prev);
}
key = prev;
Note that I allocated space for data by declaring it as array instead of pointer. The instruction
data[0] = '\0';
is there to make sure that strcat finds the null terminating byte in the first call.
You can replace the use of prev directly by key, I left it that way to make the code more readable.
A word of advice: always remember that strtok modifies its argument destructively (you lose the identity of the delimiting bytes), and that you can't call it with constant strings.
Note: data will contain every word concatenated. You lose the spaces. I'm not sure if this is what you want. If it's not, you might want to use something better than strcat (which is not very efficient, btw). For example, you code use sprintf to print the token into data with a leading space, and keep a pointer to the next free position in data.

I would suggest to replace your loop with a following code (printf() is used just for testing):
strtok_r(comando, " ", &save);
char *res = NULL;
while (NULL != (res = strtok_r(NULL, " ", &save))) {
if (key != NULL) {
//strcat(data, key); // FIXME
printf("data = %s\n", key);
}
key = res;
}
printf("key = %s\n", key);
Also strcat() should not be used with NULL arguments - it leads to a crash. So data pointer should be pointing to some array. Results of the running of the code:
┌─(16:08:22)─(michael#lorry)─(~/tmp/strtok)
└─► gcc -o main main.c; echo "one two three four five" | ./main
data=two
data=three
data=four
key = five

Lots wrong with your code
char *key = NULL, *data=NULL, *save=NULL;
Later on, you are using strcat to add strings to data but you have allocated no storage to data. That will cause a segmentation fault.
fgets(comando, 512, stdin);
fgets will read at most one less than the number passed to it. So, if the user does type in 512 characters, the string will have no terminating \n. Also, the only way to detect an error or end of file is to check the return result of fgets. If it's NULL either you have reached end of file (user has hit ctrl-d) or there is an error. In either case, the content of your buffer is indeterminate.
while(strcmp(save,"\n"))
I don't think you are allowed to rely on the assumption that your save pointer will point to the rest of the unconsumed string.
strtok_r(comando, " ",&save);
strtok_r signals that it has reached the end of the data by returning a NULL pointer. You can't throw away the return result without looking at it. Also, this will consume the trailing \n as part of the last token.
strcat(data,strtok_r(NULL," ",&save));
As I said before, data is a null pointer. Also, strtok_r can return NULL
I would do something more like:
char* currentTok = strtok_r(commando, " \n", &save); // separator is space or \n
char* previousTok = NULL;
while (currentTok != NULL)
{
if (previousTok != NULL)
{
// save previousTok in data unless its the first token
}
previousTok = currentTok;
currentTok = strtok_r(NULL, " \n", &save);
}
char* key = previousTok;

Related

Issue when using pointer to line tokens in C

I have created a program that requires reading a CSV file that contains bank accounts and transaction history. To access certain information, I have a function getfield which reads each line token by token:
const char* getfield(char* line, int num)
{
const char *tok;
for (tok = strtok(line, ",");
tok && *tok;
tok = strtok(NULL, ",\n"))
{
if (!--num)
return tok;
}
return NULL;
}
I use this later on in my code to access the account number (at position 2) and the transaction amount(position 4):
...
while (fgets(line, 1024, fp))
{
char* tmp = strdup(line);
//check if account number already exists
char *acc = (char*) getfield(tmp, 2);
char *txAmount = (char*)getfield(tmp, 4);
printf("%s\n", txAmount);
//int n =1;
if (acc!=NULL && atoi(acc)== accNum && txAmount !=NULL){
if(n<fileSize)
{
total[n]= (total[n-1]+atof(txAmount));
printf("%f", total[n]);
n++;
}
}
free(tmp1); free(tmp2);
}
...
No issue seems to arise with char *acc = (char*) getfield(tmp, 2), but when I use getfield for char *txAmount = (char*)getfield(tmp, 4) the print statement that follows shows me that I always have NULL. For context, the file currently reads as (first line is empty):
AC,1024,John Doe
TX,1024,2020-02-12,334.519989
TX,1024,2020-02-12,334.519989
TX,1024,2020-02-12,334.519989
I had previously asked if it was required to use free(acc) in a separate part of my code (Free() pointer error while casting from const char*) and the answer seemed to be no, but I'm hoping this question gives better context. Is this a problem with not freeing up txAmount? Any help is greatly appreciated !
(Also, if anyone has a better suggestion for the title, please let me know how I could have better worded it, I'm pretty new to stack overflow)
Your getfield function modifies its input. So when you call getfield on tmp again, you aren't calling it on the right string.
For convenience, you may want to make a getfield function that doesn't modify its input. It will be inefficient, but I don't think performance or efficiency are particularly important to your code. The getfield function would call strdup on its input, extract the string to return, call strdup on that, free the duplicate of the original input, and then return the pointer to the duplicate of the found field. The caller would have to free the returned pointer.
The issue is that strtok replaces the found delimiters with '\0'. You'll need to get a fresh copy of the line.
Or continue where you left off, using getfield (NULL, 2).

How does the compiler allocate memory for an array of strings in C?

I typed up this block of code for an assignment:
char *tokens[10];
void parse(char* input);
void main(void)
{
char input[] = "Parse this please.";
parse(input);
for(int i = 2; i >= 0; i--) {
printf("%s ", tokens[i]);
}
}
void parse(char* input)
{
int i = 0;
tokens[i] = strtok(input, " ");
while(tokens[i] != NULL) {
i++;
tokens[i] = strtok(NULL, " ");
}
}
But, looking at it, I'm not sure how the memory allocation works. I didn't define the length of the individual strings as far as I know, just how many strings are in the string array tokens (10). Do I have this backwards? If not, then is the compiler allocating the length of each string dynamically? In need of some clarification.
strtok is a bad citizen.
For one thing, it retains state, as you've implicitly used when you call strtok(NULL,...) -- this state is stored in the private memory of the Standard C Library, which means only single threaded programs can use strtok. Note that there is a reentrant version called strtok_r in some libraries.
For another, and to answer your question, strtok modifies its input. It doesn't allocate space for the strings; it writes NUL characters in place of your delimiter in the input string, and returns a pointer into the input string.
You are correct that strtok can return more than 10 results. You should check for that in your code so you don't write beyond the end of tokens. A reliable program would either set an upper limit, like your 10, and check for it, reporting an error if it's exceeded, or dynamically allocate the tokens array with malloc, and realloc it if it gets too big. Then the error occurs when you fun out of memory.
Note that you can also work around the problem of strtok modifying your input string by strduping before passing it to strtok. Then you'll have to free the new string after both it and tokens, which points to it, are going out of scope.
tokens is an array of pointers.
The distinction between strings and pointers if often fuzzy. In some situations strings are better thought out as arrays, in other situations as pointers.
Anyway... in your example input is an array and tokens is an array of pointers to a place within input.
The data inside input is changed with each call to strtok()
So, step by step
// input[] = "foo bar baz";
tokens[0] = strtok(input, " ");
// input[] = "foo\0bar baz";
// ^-- tokens[0] points here
tokens[1] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[1] points here
tokens[2] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[2] points here
// next strtok returns NULL

C String parsing errors with strtok(),strcasecmp()

So I'm new to C and the whole string manipulation thing, but I can't seem to get strtok() to work. It seems everywhere everyone has the same template for strtok being:
char* tok = strtok(source,delim);
do
{
{code}
tok=strtok(NULL,delim);
}while(tok!=NULL);
So I try to do this with the delimiter being the space key, and it seems that strtok() no only reads NULL after the first run (the first entry into the while/do-while) no matter how big the string, but it also seems to wreck the source, turning the source string into the same thing as tok.
Here is a snippet of my code:
char* str;
scanf("%ms",&str);
char* copy = malloc(sizeof(str));
strcpy(copy,str);
char* tok = strtok(copy," ");
if(strcasecmp(tok,"insert"))
{
printf(str);
printf(copy);
printf(tok);
}
Then, here is some output for the input "insert a b c d e f g"
aaabbbcccdddeeefffggg
"Insert" seems to disappear completely, which I think is the fault of strcasecmp(). Also, I would like to note that I realize strcasecmp() seems to all-lower-case my source string, and I do not mind. Anyhoo, input "insert insert insert" yields absolutely nothing in output. It's as if those functions just eat up the word "insert" no matter how many times it is present. I may* end up just using some of the C functions that read the string char by char but I would like to avoid this if possible. Thanks a million guys, i appreciate the help.
With the second snippet of code you have five problems: The first is that your format for the scanf function is non-standard, what's the 'm' supposed to do? (See e.g. here for a good reference of the standard function.)
The second problem is that you use the address-of operator on a pointer, which means that you pass a pointer to a pointer to a char (e.g. char**) to the scanf function. As you know, the scanf function want its arguments as pointers, but since strings (either in pointer to character form, or array form) already are pointer you don't have to use the address-of operator for string arguments.
The third problem, once you fix the previous problem, is that the pointer str is uninitialized. You have to remember that uninitialized local variables are truly uninitialized, and their values are indeterminate. In reality, it means that their values will be seemingly random. So str will point to some "random" memory.
The fourth problem is with the malloc call, where you use the sizeof operator on a pointer. This will return the size of the pointer and not what it points to.
The fifth problem, is that when you do strtok on the pointer copy the contents of the memory pointed to by copy is uninitialized. You allocate memory for it (typically 4 or 8 bytes depending on you're on a 32 or 64 bit platform, see the fourth problem) but you never initialize it.
So, five problems in only four lines of code. That's pretty good! ;)
It looks like you're trying to print space delimited tokens following the word "insert" 3 times. Does this do what you want?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char str[BUFSIZ] = {0};
char *copy;
char *tok;
int i;
// safely read a string and chop off any trailing newline
if(fgets(str, sizeof(str), stdin)) {
int n = strlen(str);
if(n && str[n-1] == '\n')
str[n-1] = '\0';
}
// copy the string so we can trash it with strtok
copy = strdup(str);
// look for the first space-delimited token
tok = strtok(copy, " ");
// check that we found a token and that it is equal to "insert"
if(tok && strcasecmp(tok, "insert") == 0) {
// iterate over all remaining space-delimited tokens
while((tok = strtok(NULL, " "))) {
// print the token 3 times
for(i = 0; i < 3; i++) {
fputs(tok, stdout);
}
}
putchar('\n');
}
free(copy);
return 0;
}

C: Parse empty tokens from a string with strtok

My application produces strings like the one below. I need to parse values between the separator into individual values.
2342|2sd45|dswer|2342||5523|||3654|Pswt
I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.
token = (char *)strtok(strAccInfo, "|");
for (iLoop=1;iLoop<=106;iLoop++) {
token = (char *)strtok(NULL, "|");
}
Any suggestions?
In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).
It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)
That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.
On a first call, the function expects
a C string as argument for str, whose
first character is used as the
starting location to scan for tokens.
In subsequent calls, the function
expects a null pointer and uses the
position right after the end of last
token as the new starting location for
scanning.
To determine the beginning and the end
of a token, the function first scans
from the starting location for the
first character not contained in
delimiters (which becomes the
beginning of the token). And then
scans starting from this beginning of
the token for the first character
contained in delimiters, which becomes
the end of the token.
What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.
Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.
char *mystrtok(char **m,char *s,char c)
{
char *p=s?s:*m;
if( !*p )
return 0;
*m=strchr(p,c);
if( *m )
*(*m)++=0;
else
*m=p+strlen(p);
return p;
}
reentrant
threadsafe
strictly ANSI conform
needs an unused help-pointer from calling
context
e.g.
char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
puts(t);
e.g.
char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
char *p1,*t1;
for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
puts(t1);
}
your work :)
implement char *c as parameter 3
Look into using strsep instead: strsep reference
Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:
// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) {
static char *current; // just as ugly as strtok!
char *pos, *ret;
if (input != NULL)
current = input;
if (current == NULL)
return current;
ret = current;
pos = strpbrk(current, delim);
if (pos == NULL)
current = NULL;
else {
*pos = '\0';
current = pos+1;
}
return ret;
}
Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string
char* strTok(char** newString, char* delimiter)
{
char* string = *newString;
char* delimiterFound = (char*) 0;
int tokLenght = 0;
char* tok = (char*) 0;
if(!string) return (char*) 0;
delimiterFound = strstr(string, delimiter);
if(delimiterFound){
tokLenght = delimiterFound-string;
}else{
tokLenght = strlen(string);
}
tok = malloc(tokLenght + 1);
memcpy(tok, string, tokLenght);
tok[tokLenght] = '\0';
*newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
return tok;
}
you can use it like
char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
printf("%s\n", tok);
}
This suppose to output
1
2
3
5
I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it
Below is the solution that is working for me now. Thanks to all of you who responded.
I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.
char strAccInfo[1024], *p2;
int iLoop;
Action() { //This value would come from the wrsp call in the actual script.
lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");
//Store the parameter into a string - saves memory.
strcpy(strAccInfo,lr_eval_string("{test_Param}"));
//Get the first instance of the separator "|" in the string
p2 = (char *) strchr(strAccInfo,'|');
//Start a loop - Set the max loop value to more than max expected.
for (iLoop = 1;iLoop<200;iLoop++) {
//Save parameter names in sequence.
lr_param_sprintf("Param_Name","Parameter_%d",iLoop);
//Get the first instance of the separator "|" in the string (within the loop).
p2 = (char *) strchr(strAccInfo,'|');
//Save the value for the parameters in sequence.
lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));
//Save string after the first instance of p2, as strAccInfo - for looping.
strcpy(strAccInfo,p2+1);
//Start conditional loop for checking for last value in the string.
if (strchr(strAccInfo,'|')==NULL) {
lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
iLoop = 200;
}
}
}

strcat() new line, duplicate string

I'm writing a function that gets the path environment variable of a system, splits up each path, then concats on some other extra characters onto the end of each path.
Everything works fine until I use the strcat() function (see code below).
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = (char *)malloc(strlen(path) + 1);
char* token[80];
int j, i=0; // used to iterate through array
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":"); //get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
for(j = 0; j <= i-1; j++) {
strcat(token[j], "/");
//strcat(token[j], exeName);
printf("%s\n", token[j]); //print out all of the tokens
}
}
My shell output is like this (I'm concatenating "/which" onto everything):
...
/usr/local/applic/Maple/bin/which
which/which
/usr/local/applic/opnet/8.1.A.wdmguru/sys/unix/bin/which
which/which
Bus error (core dumped)
I'm wondering why strcat is displaying a new line and then repeating which/which.
I'm also wondering about the Bus error (core dumped) at the end.
Has anyone seen this before when using strcat()?
And if so, anyone know how to fix it?
Thanks
strtok() does not give you a new string.
It mutilates the input string by inserting the char '\0' where the split character was.
So your use of strcat(token[j],"/") will put the '/' character where the '\0' was.
Also the last token will start appending 'which' past the end of your allocated memory into uncharted memory.
You can use strtok() to split a string into chunks. But if you want to append anything onto a token you need to make a copy of the token otherwise what your appending will spill over onto the next token.
Also you need to take more care with your memory allocation you are leaking memory all over the place :-)
PS. If you must use C-Strings. use strdup() to copy the string.
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = strdup(path);
char* token[80];
int j, i; // used to iterate through array
token[0] = strtok(pathDeepCopy, ":");
for(i = 0;(token[i] != NULL) && (i < 80);++i)
{
token[i] = strtok(NULL, ":");
}
for(j = 0; j <= i; ++j)
{
char* tmp = (char*)malloc(strlen(token[j]) + 1 + strlen(exeName) + 1);
strcpy(tmp,token[j]);
strcat(tmp,"/");
strcat(tmp,exeName);
printf("%s\n",tmp); //print out all of the tokens
free(tmp);
}
free(pathDeepCopy);
}
strtok does not duplicate the token but instead just points to it within the string. So when you cat '/' onto the end of a token, you're writing a '\0' either over the start of the next token, or past the end of the buffer.
Also note that even if strtok did returning copies of the tokens instead of the originals (which it doesn't), it wouldn't allocate the additional space for you to append characters so it'd still be a buffer overrun bug.
strtok() tokenizes in place. When you start appending characters to the tokens, you're overwriting the next token's data.
Also, in general it's not safe to simply concatenate to an existing string unless you know that the size of the buffer the string is in is large enough to hold the resulting string. This is a major cause of bugs in C programs (including the dreaded buffer overflow security bugs).
So even if strtok() returned brand-new strings unrelated to your original string (which it doesn't), you'd still be overrunning the string buffers when you concatenated to them.
Some safer alternatives to strcpy()/strcat() that you might want to look into (you may need to track down implementations for some of these - they're not all standard):
strncpy() - includes the target buffer size to avoid overruns. Has the drawback of not always terminating the result string
strncat()
strlcpy() - similar to strncpy(), but intended to be simpler to use and more robust (http://en.wikipedia.org/wiki/Strlcat)
strlcat()
strcpy_s() - Microsoft variants of these functions
strncat_s()
And the API you should strive to use if you can use C++: the std::string class. If you use the C++ std::string class, you pretty much do not have to worry about the buffer containing the string - the class manages all of that for you.
OK, first of all, be careful. You are losing memory.
Strtok() returns a pointer to the next token and you are storing it in an array of chars.
Instead of char token[80] it should be char *token.
Be careful also when using strtok. strtok practically destroys the char array called pathDeepCopy because it will replace every occurrence of ":" with '\0'.As Mike F told you above.
Be sure to initialize pathDeppCopy using memset of calloc.
So when you are coding token[i] there is no way of knowing what is being point at.
And as token has no data valid in it, it is likely to throw a core dump because you are trying to concat. a string to another that has no valida data (token).
Perphaps th thing you are looking for is and array of pointers to char in which to store all the pointer to the token that strtok is returnin in which case, token will be like char *token[];
Hope this helps a bit.
If you're using C++, consider boost::tokenizer as discussed over here.
If you're stuck in C, consider using strtok_r because it's re-entrant and thread-safe. Not that you need it in this specific case, but it's a good habit to establish.
Oh, and use strdup to create your duplicate string in one step.
replace that with
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":");//get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
// use new array for storing the new tokens
// pardon my C lang skills. IT's been a "while" since I wrote device drivers in C.
const int I = i;
const int MAX_SIZE = MAX_PATH;
char ** newTokens = new char [MAX_PATH][I];
for (int k = 0; k < i; ++k) {
sprintf(newTokens[k], "%s%c", token[j], '/');
printf("%s\n", newtoken[j]); //print out all of the tokens
}
this will replace overwriting the contents and prevent the core dump.
and don't forget to check if malloc returns NULL!

Resources