using strtok() in c with combined word - c

I'd like to know how to use strtok to find values, so is this possible to use strtok(mystring, "") or no?
I want split this : mystring --> %3456 I want split into 2 parts : "%" and "3456". Is this possible? how can I do that?

You cannot use strtok for this purpose: strtok will modify its first argument, overwriting the first separator with a '\0'.
Use strspn or strcscn() to scan for sequences of known characters, and copy the sequences into a separate buffer with memcpy.

Related

Word Replace in C, not Substring

Is there a way to replace all occurrences of a word in a string in C with another word. By word, I don't mean substring.
Here's what I want to achieve:
Input String:
OneOne One One OneOneOne One
Word to find:
One
Word to Replace it with:
Forty
Desired Output:
OneOne Forty Forty OneOneOne Forty
there are many example functions for replacing words in a string, i.e. What is the function to replace string in C?
you can use these functions to do what you want because the (white)spaces are also characters, see Removing Spaces from a String in C? and https://www.cs.tut.fi/~jkorpela/chars/spaces.html
so in the replace functions it is a difference if the replace string is
replace ='One' or replace = ' One '
if you use the second this should work
https://en.wikipedia.org/wiki/Whitespace_character
Unicode stored in C char

C strtok, split string to two part using first space

i have 3 pointers
char line[MAX_STR];
char *inputCmd,*inputArgs;
and i'm using
inputCmd = strtok(line," ");
I wonder how i can split it to just two parts
In example
line = {"COMMAND A PARAMTER TO CHECK..."};
I want the
inputCmd will point to "COMMAND"
and inputArgs will point to "A PARAMTER TO CHECK..."
Thanks.
I
You don't have to use the same token for every call to strok.
So if your format is
string1|space|remainder|nul|
you can call strtok with a space and the string, then call again with null for the string argument and nul for the token.

Find words divided by whitespace

Is it possible to use fgets() to save different words divided by whitespace and then find each word?
For example let's say I have this:
char words[100];
fgets(words,100,stdin);
and then I have to find each word to use it in the rest of my program. How can I do that?
You can use strtok_r or you could use the pcre library if you want to do things with regex.
char *save_ptr;
char *word = strtok_r(words, " \t", save_ptr);
and then repeated other calls to
word = strtok_r(words, " \t", save_ptr);
until word == NULL
fgets() will save your input into a string. To divide it into individual words, you can either go through the string (possibly using isalpha() and similar), or use strtok() to get individual words.

Is there a way to tokenize two strings and go through the tokens in parallel?

Is there a way to tokenize two strings and move through the tokens in parallel? (since I have two strings I can't use strtok(NULL, "bar") twice)
Also, is there way to make strtok to token a string and keep that string untokened?
For example:
void foo(char* form, char* num){
char *templ=form, *tempr=num;
templ = strtok(templ, " ");//but this tokenize form as well
Yes, on most platforms there is. You can use strtok_r, which is the reentrable version of strtok which does not store the state in static memory:
char *save1, *save2;
temp1 = strtok_r(templx, " ", &save1);
temp2 = strtok_r(temply, " ", &save2);
Note that you should generally use strtok_r if it is available, even for parsing a single source of tokens.
If strtok_r is not available, you could resort to using sscanf and keeping track of the position in the string being tokenized. Depending on the complexity of the tokenization task that you are trying to solve, this could provide a viable solution as well.
So, once upon a time, the man page for strtok was much more blatant about not using it. At any rate:
strtok_r is the reentrant version; that should allow you to go through the tokens in parallel.
As far as I know, it's not possible to have strtok not keep the string untokened; you would want to use strcpy to make a copy of the unmangled version.

Obtaining zero-length string from strtok()

I have a CSV file containing data such as
value;name;test;etc
which I'm trying to split by using strtok(string, ";"). However, this file can contain zero-length data, like this:
value;;test;etc
which strtok() skips. Is there a way I can avoid strtok from skipping zero-length data like this?
A possible alternative is to use the BSD function strsep() instead of strtok(), if available.
From the man page:
The strsep() function is intended as a replacement for the strtok()
function. While the strtok() function should be preferred for
portability reasons (it conforms to ISO/IEC 9899:1990 ("ISO C90"))
it is unable to handle empty fields, i.e., detect fields delimited by
two adjacent delimiter characters, or to be used for more than a
single string at a time. The strsep() function first appeared in
4.4BSD.
A simple example (also copied from that man page):
char *token, *string, *tofree;
tofree = string = strdup("value;;test;etc");
while ((token = strsep(&string, ";")) != NULL)
printf("token=%s\n", token);
free(tofree);
Output:
token=value
token=
token=test
token=etc
so empty fields are handled correctly.
Of course, as others already said, none of these simple tokenizer functions handles
delimiter inside quotation marks correctly, so if that is an issue, you should use
a proper CSV parsing library.
There is no way to make strtok() not behave this way. From man page:
A sequence of two or more contiguous delimiter bytes in the parsed
string is considered to be a single delimiter. Delimiter bytes at the
start or end of the string are ignored. Put another way: the tokens
returned by strtok() are always nonempty strings.
But what you can do is check the amount of '\0' characters before the token, since strtok() replaces all encountered tokens with '\0'. That way you'll know how many tokens were skipped. Source info:
This end of the token is automatically replaced by a null-character,
and the beginning of the token is returned by the function.
And a code sample to show what I mean.
char* aStr = ...;
char* ptr = NULL;
ptr = strtok (...);
char* back = ptr;
int count = -1;
do {
back--;
if (back <= aStr) break; // to protect against reads before aStr
count++;
} while (*back = '\0');
(written without ide or testing, may be an invalid implementation, but the idea stands).
No you can't.
From "man strtok":
A sequence of two or more contiguous delimiter characters in the
parsed string is considered to be a single delimiter. Delimiter
characters at the start or end of the string are ignored. Put
another way: the tokens returned by strtok() are always nonempty
strings.
You could also run into problems if your data contains the delimiter inside quotes or any other "escape".
I think the best solution is to get a CSV parsing library or write your own parsing function.
From recent experience, it looks like strtok() does not necessarily replace all delimiters with the end of string characters, but rather replaces the first delimiter it finds with an end of string character and skips the following delimiters but leaves them in place.
This means that in the nominal case (no zero-length strings before delimiters), every call to strtok() after the first call to strtok() will return a pointer to a string that begins after a \0 character.
In the case where strtok() reads zero-length strings between delimiters, strtok() will return a pointer to a string that begins after a delimiter character that has not been replaced with \0.
Here is my solution for finding out whether strtok() has skipped a zero-length string between delimiters.
// Previous code is needed to point strtok to a string and start ingesting from it.
char * field_string = strtok(NULL, ',');
// Note that this can't be done after the first call to strtok for a given buffer, since the previous character would be outside of the string's memory space.
if (*(field_string-1) == '\0') {
// no delimiters were skipped
} else {
// one or more delimiters were skipped
}

Resources